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ABSTRACT 


This  thesis  focuses  on  the  design  and  analysis  of 
interprocess  communication  protocols  for  networks  of  computers. 
Previous  research  has  emphasized  system  performance  at  lower 
levels,  within  the  communication  medium  itself.  This  work 
examines  requirements  and  performance  of  protocols  for 
communication  between  processes  in  the  Host  computers  attached 
to  the  communication  system. 

Both  the  reliability  and  the  efficiency  of  protocols  are 
discussed.  Reliability  involves  overcoming  unreliable  natwork 
transmission  facilities  to  avoid  loss,  duplication,  or 
out-of-order  del i very  of  data.  Reliability  performance  goals 
are  defined,  and  the  correctness  of  different  protocol 
mechanisms  in  achieving  these  goals  is  demonstrated.. 
Consequences  of  protocol  failures  (Host  crashes)  and  problems  of 
initializing  control  mechanisms  required  for  reliable 
communication  are  also  considered. 

Efficiency  primarily  concerns  throughput  and  delay 
achievable  for  communication  between  remote  processes.  The 
perforntance  of  successively  more  powerful  protocols  including 
error  detection,  retransmission,  flow  control,  limited 
buffering,  and  sequencing  is  analyzed.  Protocol  parameters  such 
as  retransmission  interval,  window  size,  buffer  allocation. 
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packet  and  acknouledgement  strategy  emerge  as  important 

factors  In  determining  efficiency.  Several  graphs  showing 

quantitative  performance  results  for  representative  situations 
^ are  included. 

! An  additional  section  of  the  thesis  considers  the 

i 

I problems  of  interconnecting  heterogeneous  computer  networks  to 

allow  communication  between  processes  in  different  networks. 
Topics  discussed  include  global  addressing  and  routing 

techniques,  level  of  network  interconnection,  extent  of  changes 
required  in  individual  nets,  and  functions  performed  by  the 
interface  or  Gateway  between  networks. 
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Chapter  I 
INTRODUCTION 

This  thesis  focuses  on  the  design  and  analysis  cf 
interprocess  communication  protocols  for  use  In  computer 
netuorks.  The  feasability  and  utility  of  computer  networks  has 
been  clearly  demonstrated  recently  with  several  sophisticated 
nets  fully  operational  (NPL,  ARPA,  TYMNET,  ALOHA)  and  many 
others  planned  (CYCLADES,  EPSS,  SITA,  CANUNET,  AUTODIN  II)  (see 
Appendix  B).  However,  much  of  the  research  accompanying  these 
developments  has  emphasized  system  performance  within  the 
communication  network  itself.  Our  study  examines  the 
requirements  for  communication  between  processes  In  the  Host 
computers  attached  to  the  network. 

To  clarify  our  level  of  interest,  we  note  the  parallel 
history  of  computer  network  development  and  single  computer 
system  development.  In  single  computer  systems,  the  original 
empahasis  was  on  "hardware"  questions  of  memories,  bus 
structure,  basic  arithmetic  and  logic  operations,  etc. 
Eventually  such  hardware  design  problems  became  a specialty,  and 
efforts  to  provide  a more  convenient  interface  to  the  computer 
user  grew  in  importance.  Programming  languages,  operating 
systems,  and  time  sharing  were  born. 
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Similarly,  the  first  years  of  computer  network 
devalopment  have  emphasized  internal  design  questions  sucii  as 
circuit  topology  and  capacity  [Frank70,  Frank72a,  Cerf75a] , 
routing  [Frank71,  Fultz72,  flcQui  I Ian74] , switching  node 

requirements  tFultz72,  fIcQui  1 Ian72] , rel  iabi  I i ty  [VanSlyke72] , 
and  congestion  control  t0avles72,  Kahn72] . This  emphasis  is 
most  apparent  in  the  ARPANET  and  related  packet  switching 
network  experience  [Frank72,  netcalfe73].  Many  of  these 
internal  design  problems  now  have  a well  developed  theory  and 
practice  tkleinrock70,  Frank72,  Pyke73,  Karp73,  Kershenbaum74] 
that  serves  to  provide  the  basic  communication  facility  or 
transmission  medium  that  interconnects  network  users  at  the 
lowest  level. 

Unfortunately,  processes  attempting  to  communicate  with 
each  other  over  a ccTiputer  network  face  a problem  similar  to 
humans  trying  to  use  a single  computer;  the  basic  or  "raw" 
facility  provided  is  often  too  primitive,  unreliable,  or 
otherwise  inconvenient.  The  traditional  approach  in  the 
computing  domain  has  been  to  create  an  operating  system  to 
bridge  the  gap  between  raw  machine  and  user  desires,  creating  a 
"virtual"  machine  that  is  much  more  powerful,  reliable,  and 
convenient. 

To  facilitate  interprocess  communication  over  a computer 
network,  a similar  "augmented"  communication  facility  must  be 
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built  upon  the  basic  netuork  services  available.  In  fact,  many  || 

different  levels  of  augmented  service  prove  desirable  for  j 

various  special  communication  purposes  ICrocker72] . A partially 
reliable  "best  effort"  communication  service  represents  the 
louest  level,  folloued  by  a general  purpose  fully  reliable 
interprocess  communication  protocol,  and  finally  various  special  | 

purpose  services  such  as  file  transfer,  remote  Job  entru. 

i 

interactive  terminal,  and  graphics.  I 

Compared  to  the  rigor  of  Internal  netuork  design  theory 
and  practice,  the  science  of  higher  level  protocol  design  is  in 
its  infancy.  This  thesis  focuses  on  the  general  interprocess 
communication  protocols  uhich  provide  the  basic  facility  on 
which  more  specialized  services  ui!l  be  built  IPouzin74c].  ' 

Although  good  interprocess  communication  facilities  are 

I 

I 

a necessary  condition  for  the  flexible  resource  sharing 
envisioned  by  netuork  architects  IRoberts72,  Kahn72a,  Uatson73, 

HcKay731,  they  are  by  no  means  sufficient.  A wide  range  of 
higher  level  problems  in  distributed  system  design  such  as 
synchronization,  file  systems  IThomas731 , task  part  i t ion  ing,  i 

I 

resource  al I ocat ion,  priority  assignment  IBoudon721 , etc.  remain  \ 

to  be  solved.  ^ 

Resource  sharing  in  the  distributed  environment  of 
computer  networks  imposes  special  demands  on  interprocess  I 

communication  facilities.  Processes  are  seen  as  tha  active  I 
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elements  in  a distributed  computing  system.  Human  users  at 
terminals,  I/O  devices,  file  systems,  service  routines, 
operating  systems,  are  all  represented  by  processes  that 
communicate  to  accomplish  their  goals.  Processes  are  often 
treated  as  equals  for  communication  purposes  rather  than 
requiring  a "master"  and  "slave"  relationship  typical  of  polling 
or  centralized  control  systems  (e.g.  IBII’s  SDLC  [Donan74, 
Kersey74]).  As  opposed  to  traditional  centralized  systems  where 
reliability  is  often  taken  for  granted,  the  distributed 
environment  of  computer  networks  demands  that  the  interprocess 
communication  facility  pay  explicit  attention  to  assuring 
reliability  tfletcal  fe72I . These  considerations  determine  the 
type  of  augmented  service  desirable,  or  performance  goals  for  an 
interprocess  communication  protocol  in  a computer  network 
environment. 

Pel iabi I i tu  and  ef f iciencu  may  be  distinguished  as  two 
main  classes  of  performance  goal.  Reliability  of  the 
interprocess  communication  protocol  involves  avoiding  loss  or 
duplication  of  data  transmitted,  delivering  data  in  the  same 
order  as  submitted,  and  properly  initializing  and  terminating 
data  transfers  for  continued  rel iable  operat ion.  Efficiency 
primarily  concerns  throughput  and  delay  achievable  for 
communication  between  remote  processes.  These  depend  on  the 
operation  of  protocol  mechanisms  such  as  retransmission,  flow 
control,  buffer  al  location,  sequencing,  and  fragmentation. 
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These  performance  goals  define  one  side  of  the  protocol 
design  problem,  uhi  le  the  transmission  medium  characteristics 
define  the  other.  To  the  greatest  extent  possible,  ue  take 
transmission  medium  behavior,  particularly  the  difficult 
cfiaracteri sties  of  packet  switching  nets  (PSN),  as  a given  set 
of  characteristics  which  must  be  dealt  with  by  a protocol, 
rather  than  assuming  then  away  to  simplify  analysis.  After  a 
summary  of  the  contents  of  this  thesis  in  section  1,  we  return 
to  discuss  transmission  medium  behavior  in  section  2.  This 
characterization  serves  as  a basis  for  protocol  design 
considerations  throughout  the  rest  of  this  work. 
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1 . SUmARY 

Chapters  !I  and  III  present  the  major  neu  results  of 
this  study  concerning  rel iabi I ity  and  efficiency  of  protocols 
respectively.  Our  contribution  is  both  methodological  in 
developing  neu  analysis  techniques,  particularly  in  chapter  II, 
and  substantive  in  presenting  answers  to  protocol  desigr 
problems.  Chapter  IV  presents  a survey  of  recent  work  on 
protocols  suitable  for  the  interconnection  of  packet  switching 
networks  and  a discussion  of  the  interface  or  Gateway  between 
networks. 

Chapter  II  defines  reliability  performance  measures  and 
considers  the  protocol  mechanisms  necessary  to  achieve  various 
levels  of  reliability.  Interest  in  protocol  verification  has 
increased  recently,  with  several  authors  applying  various  proof 
techniques  to  verifying  certain  aspects  of  protocol  reliability 
IPostel74,  Bochman75,  Merlin751.  Ue  employ  less  formal 
techniques  in  order  to  achieve  results  for  more  realistic 
assumptions  about  underlying  transmission  medium  behavior.  Ue 
are  particularly  interested  in  developing  protocols  able  to 
overcome  the  potentially  hostile  transmission  characteristics  of 
packet  switching  networks,  and  our  analysis  covers  the  full 
range  of  network  behavior. 

The  consequences  of  protocol  fai lures  (for  example  due 
to  Host  crashes)  are  also  considered,  leading  to  an  important 
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result  that  interprocess  communication  protocols  cannot 
guarantee  "Invisible"  recovery  from  protocol  failures.  That  is, 
loss  or  dupl  ication  of  data  cannot  be  prevented  by  the  protocol 
itself  after  a failure,  but  higher  level  error  recovery 
procedures  must  be  invoked. 

Chapter  II  also  considers  the  problems  involved  In 
initializing  the  control  information  required  by  reliable 
communication  mechanisms.  Sophisticated  methods  of  select! rig 
initial  control  information  values  and  synchronizing  the  other 
side  of  a connection  prove  necessary  in  a potentially  hostile 
environment  such  as  a PSN.  Such  mechanisms  are  defined  and 
verified  using  a state  diagram  model  which  includes  the  state  of 
both  protocol  processes  (on  each  side  of  the  connection)  plus 
information  transmitted  between  processes. 

Chapter  III  defines  efficiency  performance  measures  of 
throughput  and  delau  for  interprocess  communication  protocols. 
Successively  more  powerful  protocols  including  retransmission, 
flow,  control,  limited  buffering,  and  sequencing  are  analyzed  to 
determine  the  impact  of  protocol  parameters  such  as 
retransmission  interval,  window  size,  buffer  al location,  and 
fragment  size  on  efficiency.  Transmission  medium 
characteristics  such  as  delay,  bandwidth,  and  errors  also 
strongly  affect  efficiency.  In  Chapter  III  we  are  forced  to 
make  some  simplifying  assumptions  about  transmission  .nedi urn 
characteristics. 


1 

'i 

i 

; 

i i 


1 


f 

i 

i 

i 

3 

3 

I 

M 

1 

] 

I 

I 


I 


Summary 


8 


Chapter  IV  considers  the  problem  of  interconnecting 
independent  computer  networks  to  provide  communication  between 
processes  on  computers  in  different  networks.  Ue  compare 
several  approaches  to  network  interconnection  including  those 
requiring  substantial  changes  to  existing  local  net  operations. 
Techniques  to  implement  internet  addressing,  routing,  and  other 
protocol  services  discussed  in  chapters  II  afid  III  on  top  of 
local  netHu  t,  facilities  appear  to  be  feasible  without  imposing 
changes  on  individual  nets. 
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2.  TRANSniSSlDN  MEDlUri  CHARACTERISTICS 

This  section  identifies  the  important  characteristics  of 
the  basic  transmission  medium  or  communication  facility  that  the 
protocol  must  use  in  providing  an  augmented  service.  The  six 
characteri sties  discussed  emerged  pr i mar i I y through  experience 
uiith  packet  suiitching  neiuorks  as  the  basic  transmission  medium, 
and  are  most  appropriate  to  that  context.  The  transmission 
medium  is  assumed  to  accept  packets  or  blocks  of  data  from  a 
source  and  make  a "best  effort"  to  deliver  them  to  the 

destination.  That  is,  ue  limit  our  concern  to  packet 

communication  media  [ttetcsl fe731 , although  not  strictly  to 
packet  switching  nets.  Many  of  the  points  raised  apply  to  other 

network  technologies  or  even  simple  communication  lines  as  well, 

and  an  effort  Is  made  to  include  these  points  In  the  following. 

The  transmission  medium  characteristics  discussed  are: 

1)  Variable  delay 

2)  Duplication  of  pac'.tets 

3)  Loss  and  damage  of  packets 

4)  Out-of-order  delivery 

5)  Packet  size 

6)  Bandwidth 
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I Delau 

I Transmission  medium  delay  is  the  amount  of  time  betueen 

submission  of  a packet  at  the  source  and  delivery  of  the  packet 
at  the  destination.  Traditionally,  total  delay  is  decomposed 

j into  transmission  delau.  or  the  time  required  to  transmit  all 

the  bits  of  a packet  at  the  nominal  rate  of  the  transmission 
medium,  plus  propagation  delau.  or  the  time  it  takes  a bit  to 
travel  through  the  transmission  medium  to  the  destination.  On  a 
dedicated  harduare  line  these  delays  are  relatively  constant. 

^ If  the  transmission  medium  is  shared  among  many  users  as  is 

f 

I usually  the  case,  there  may  also  be  access  or  queuing  da  lag 

while  a packet  waits  its  turn  to  be  transmitted. 

In  store-and-forward  packet  switching  networks  with  many 
nodes,  the  packet  experiences  combinations  of  these  delays  at 
every  hop,  and  as  competing  traffic  level  increases,  larger 
access  or  queuing  delays  occur  at  each  node.  Errors  followed  by 
retransmissions  between  nodes,  and  alternate  routing  of  packets 
also  contribute  to  variations  In  the  delay  experienced  by 
different  packets.  Without  specifying  the  global  traffic 
pattern  and  routing  algorithm,  it  is  difficult  to  detail  the 
delay  distribution,  but  in  general  variations  equal  to  or 
greater  than  the  mean  delay  are  typical  [Forgie751. 

Satellite  links  with  high  bandwidths  impose  large 

propagation  delays  (about  250  msec)  so  retransmission  delay 


[ 
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becomes  a more  important  source  of  variation,  although  satellite 
error  rates  are  lower  than  ground  lines  [Sastry741.  Loop  nets 
provide  relatively  constant  delay  characteristics  wl  th  access 
delay  accounting  for  the  major  variability. 

An  important  delay  characteristic  for  protocol  design  is 
the  maximum  propagation  time  or  packet  lifetime  in  the 
transmission  medium,  represented  by  L.  For  simple  data  links 
and  loop  nets,  packet  lifetime  is  nearly  constant  and  determined 
largely  by  line  length  and  physical  transmission  properties.  In 
packet  switching  networks  with  occasional  routing  anomalies  and 
other  malfunctions,  L may  be  orders  of  magnitude  greater  than 
the  normal  or  mean  propagation  times.  As  discussed  in  chapter 
II,  protocols  must  always  be  wary  of  "old"  packets  arriving,  so 
a minimum  L is  desirable.  One  means  suggestad  to  achieve  this 
is  a packet  which  self-destructs  after  a specified  time  in  the 
net.  Further  consideration  of  such  transmission  medium  problems 
is  beyond  the  scope  of  this  work,  and  L is  taken  as  one  of  the 
given  characteristics  of  the  transmission  medium. 


Pup  I i cat  ion 

A single  packet  submitted  for  transmission  may  be 
duplicated  by  the  transmission  medium  and  more  than  one  copy 
delivered  to  the  destination.  Normally  the  transmission  medium 
removes  any  duplicates  it  generates  (by  internal 
retransmi ssions),  but  certain  line  or  node  failures  at  critical 
moments  can  result  in  duplicates  emerging  at  the  destination. 
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Loss  and  Damage 

A great  deal  of  work  has  boon  done  to  characterize  the 
errors  occurring  on  real  transmission  lines  of  various  types 
tTownsendB4,  Benice64a,  Trafton71,  Burton72] . Detecting  damaged 
packets  is  a well  developed  problem  in  coding  theory.  Without 
further  discussion,  we  assume  that  the  protocol  designer  has 
techniques  available  to  decode  packets  and  detect  errors  in 
packets  to  any  desired  degree  of  reliability.  Davies  and  Barber 
(1973)  survey  the  problem  of  error  characteristics  and  suitable 
coding  techniques.  Higher  rel iabi I i ty  requires  longer  codes, 
increasing  the  overhead  to  transmit  a packet,  and  reducing 
effective  bandwidth  as  discussed  in  chapter  III, 

On  hardware  lines,  unless  the  line  is  completely  open, 
it  is  safe  to  assume  that  a transmi t ted  packet  will  arrive 
either  intact  or  damaged.  In  networks  with  multiple  lines  and 
nodes  in  the  transmission  path,  the  possibility  of  total  packet 
loss  is  finite  and  must  be  explicitly  recognized.  Some  networks 
even  discard  packets  as  a means  of  internal  congestion  control. 

Normally  a protocol  will  discard  damaged  packets  and 
wait  for  or  request  their  retransmission.  This  converts  the 
problem  of  packet  damage  to  packet  loss,  treating  both  with  the 
same  mechanism  (retransmission).  However,  some  applications  may 
tolerate  the  delivery  of  damaged  packets  to  the  protocol  user, 
and/or  loss  of  some  packets. 
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Ordering 

Packets  may  arrive  at  the  destination  in  a di  fferent 
order  than  they  uere  submitted  to  the  transmission  medium.  In  a 
packet  switching  net  with  multiple  paths  from  source  to 
destination  and  alternate  routing  of  packets,  a packet  submitted 
later  may  travel  by  a shorter  route  and  arrive  sooner  than  a 
packet  submitted  earl  I er.  In  I ine  swi  teed  mU  or  on  simple 
data  links  where  all  packets  follow  the  same  route, 
retransmission  to  correct  internal  errors  may  cause  the 

retransmitted  packet  to  arrive  after  one  or  more  later  packets 
transmitted  successfully  the  first  time.  Some  networks  provide 
an  ordering  facility  at  the  destination,  implementing  an 

end-to-end  sequencing  service  within  the  network  communication 
faci I i ty. 

Packet  Size 

The  transmission  medium  normally  has  a maximum  size 
packet  that  it  will  accept.  If  a process  wishes  to  send  a chunk 
of  data  larger  than  this  size,  the  protocol  must  fragment  the 
chunk  into  pieces  small  enough  for  transmission,  and  reassemble 

the  chunks,  or  at  least  deliver  the  fragments  in  order  at  the 

destin^jtion.  The  transmission  medium  itself  may  have  to 
fragment  packets  for  transmission  on  certain  links  (oart i cu I ar I y 
likely  if  different  nets  are  connected  together),  so  there  may 

be  layers  of  fragmentation  and  reassembly,  or  a uniform 
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fragmentation  scheme  used  at  all  levels  which  requires 
reconstruction  only  at  the  destination. 

The  optimization  of  packet  size  involves  consideration 
of  line  rates,  error  characteristics,  traffic  patterns,  and 
performance  objectives  IMetcalfe73,  Crowther741.  At  the 
interprocess  communication  protocol  design  level,  we  assume  a 
given  maximum  packet  s>ze,  P. 

Bandwidth 

The  transmission  medium  accepts  packets  at  a nominal  bit 
rate  0.  In  a packet  switching  net  this  rate  is  usually  the 
hardware  line  capacity  from  the  Hos+  to  the  packet  switch. 
However,  the  transmission  medium  may  become  "unavailable"  at 
times  due  to  internal  congestion  control  mechanisms,  and  the 
effective  bit  rate  offered  is  further  reduced  by  framing  and 
control  information  required  on  the  line.  These  considerations 
affect  any  protocol  using  the  transmission  medium,  and  will  not 
be  considered  further. 
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Chapter  II 


In  this  chapter  ue  study  the  reliability  of  interprocess 
communication  protocols.  As  discussed  in  chapter  I,  this 
requires  a clear  understanding  of  protocol  performance  goals  and 
transmission  medium  characteristics  relevant  to  reliability 
since  a protocol  must  bridge  the  gap  betueen  services  available 
[ and  facilities  desired. 

Ue  are  particularly  interested  in  treating  the  full 
[ range  of  transmission  medium  behaviour  that  protocols  may  have 

to  cope  uith.  Uhlie  other  authors  have  assumed  we  I I -behaved 
transmission  media  in  order  to  apply  formal  analysis  techniques, 

* we  are  more  interested  in  developing  protocol  mechanisms 

suitable  for  worst  case  situations  In  real  packet  switching 

j;  network  environments.  Ue  return  to  this  difference  in  goals  in 

section  1.2  below. 

Transmission  medium  characteristics  most  relevant  to 
reliability  include  delay,  loss,  damage,  duplication,  and 
out-of-order  delivery  of  information  (cf  chapter  I).  The  basic 
function  of  an  interprocess  communication  protocol  is  to  mask 
) these  undesirable  characteristics  and  provide  a reliable  and 

, . convenient  communication  path  between  processes  (see  figure  1). 

I 

I 

* 

i 
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FIGURE  1 AUGMENTED  SERVICE  FROM  INTERPROCESS 
COMMUNICATION  PROTOCOL  (IPC) 
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Hence  an  Important  goal  of  protocol  analysis  Is  to  demonstrate 
that  a candidate  protocol  or  class  of  protocols  Indeed  provides 
the  desired  reliability:  i .e.  does  not  dup  1 1 cate  packets,  lose 


packets,  or  deliver  them  out  of  order. 


Protocol  verification  is  one  part  of  the  complete 
protocol  design  process.  Successful  protocol  design  also 
requires  a clear  specification  of  performance  goals,  development 
of  mechanisms  to  achieve  those  goals,  and  evaluation  of 


alternative  mechanisms.  Evaluation  includes  verification  that 


the  performance  goals  are  met  when  the  protocol  is  functioning 
normally.  Unfortunately,  normal  coK^uter  system  operations  are 
occasionally  disrupted  by  catastrophic  failures  (hardware 


faults,  deadlocks,  protection  violations,  restarts,  etc.). 


Hence  another  important  consideration  of  protocol  analysis 
concerns  the  results  of  protocol  failures.  By  protocol  failures 
we  mean  the  (rare)  malfunction  of  a normally  correct  protocol 
due  to  some  external  catastrophe,  rather  than  a protocol  that 
normally  functions  Incorrectly  due  to  a flaw  in  Its  algorithms. 

A third  component  of  protocol  evaluation  concerns 
.initial  igatipn  of  the  protocol  mechanisms  used  to  overcome 
transmission  medium  deficiencies.  Since  this  initialization  may 
have  to  be  performed  over  the  same  unreliable  transmission 


medium,  it  presents  a difficult  synchronization  problem. 


MW. 
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These  four  topics  are  the  main  subject  matter  of  chapter 
II: 

(1)  Definition  of  performance  goals  and  protocol  mechanisms 
to  achieve  reliability. 

(2)  Verification  of  correct  protocol  operation  under  normal 
ci rcum stances. 

(3)  Results  of  protocol  failures. 

(4)  Protocol  initialization  requirements  and  techniques. 
Whenever  possible,  the  cost  of  the  various  solutions  to  thett. 
problems  is  discussed. 

Section  2 outlines  the  important  parts  of  an 
interprocess  communication  protocol,  and  defines  reliability 
performance  measures  (efficiency  measures  are  considered  in 
chapter  III).  Section  3 describes  a simple  protocol  to  avoid 
loss  and  dup I i cat  ion  of  packets,  verifies  the  reliability  of 
this  protocol,  and  explores  the  consequences  of  protocol 
failures.  Section  4 extends  the  analysis  to  a more  powerful 
protocol  including  a sequencing  mechanism  that  correctly  orders 
del i vered  packets. 

Analysis  of  the  protocols  specified  in  sections  3 and  4 
shows  that  when  both  sides  of  the  protocol  function  correctly, 
loss,  duplication,  and  ordering  problems  are  eliminated. 
However,  when  one  side  of  the  protocol  fails  (memory  loss)  as 
would  occur  in  a Host  crash/restart,  we  prove  that  loss  or 
duplication  of  packets  may  occur.  That  is,  it  is  impossible  to 
guarantee  error-free  recovery  after  a failure  by  either  side  of 
a connection. 
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In  section  5 ue  consider  the  addi  t lonal  prob lems  of 
initializing  the  mechanisMs  which  achieve  reliable 
communication.  This  initialization  Is  widely  referred  to  as 
connection  establishment,  and  presents  a difficult 
synchronization  problem  since  it  must  be  accomplished  using  the 
unreliable  basic  communication  facility.  The  concepts  of 
connection  and  connection  state  are  defined,  and  mechanisms  to 
reliably  estal^lish  connections  (initialize  protocols)  are 
specified  and  classified.  Ue  demonstrate  the  I imi  tat  ions  of 
various  mechanisms  and  prove  the  robustness  of  the  "3-way 
handshake"  mechani  sm  for  connect  ion  estabi  i shment  IToml  inson74, 
Dalal74)  using  a composite  state  diagram  model.  Consequences  of 
failures  (memory  loss)  and  failure  recovery  techniques  are  also 
considered. 


1. 1 Related  Uork 

"Closed  loop"  or  "feedback  correction"  typo  protocols 
suitable  for  overcoming  the  transmission  medium  characteristics 
discussed  in  chapter  I have  been  widely  treated  in  the 
literature  [Benice64a,  Lynch68,  Stutzman72,  Burton72, 
flotcal  fe73J . The  ARQ  typo  protocol  is  more  suitable  for 
hardware  I ines  where  complete  packet  loss  is  impossible,  since 
It  requires  either  a positive  acknowledgement  (ACIO  or  negative 
acknowledgement  (NCK  or  Retransmission  Request)  for  every 
packet  sent. 
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Several  authors  have  used  state  diagrams  to  mod') I simple 
ARQ  protocols  ILynchBS,  BartlettGS,  Birke71,  Bochman74] . Lynch 
presents  an  informal  proof  that  these  protocol s provide  reliable 
communication  over  well  behaved  transmission  media  that  never 
lose  packets,  duplicate  packets,  or  deliver  packets  out  of 
order.  Recently  Bochman  has  analyzed  the  same  protocol  by  the 
method  of  action  sequences.  Seidler  (1975)  presents  a more 
formal  model  for  analysis  of  ARQ  type  protocols. 

A Posi t ive  Acknowledgement,  Retransmission  on  timeout 
(PAR)  type  protocol  is  more  suitable  in  a packet  switching  net 
environment  where  data  packets  or  acknowledgements  may  be 
completely  lost,  since  these  protocols  do  not  require  a NACK  to 
stimulate  retransmission.  Forward  error  correction  may  be  used 
in  addition  to  error  detection  on  noisy  channels  to  reduce 
retransmission  [BeniceB4b,  Sastry74].  Metcalfe  provides  an 
excellent  summary  of  the  motivation  for  and  suitability  of  PAR 
protocols  [Metcalfe73  pp.  3-4  to  3-11].  Kalin  (1971)  has  also 
discussed  several  important  considerations  in  protocol 
reliability  including  protocol  initialization. 

Postal  (1974)  has  analyzed  some  simple  examples  of  PAR 
protocols  for  "proper  termination."  This  analysis  shows  that 
the  specified  protocol  functions  correctly  in  avoiding  loss, 
duplication,  and  out-of-order  delivery.  Postel’s  work  does  not 
treat  the  general  class  of  PAR  protocols  or  examine  the 
consequences  of  protocol  failures.  A specific  connection 
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establishment  procedure  (the  ARPANET  Initial  Connection 
Protocol)  is  shoMn  to  have  a race  condition,  but  the  general 
question  of  connection  establishment  is  not  treated. 

Merlin  (1974,  1975)  has  used  Petri  nets  and  their 
corresponding  "token  machines"  to  show  that  a simple  class  of 
PAR  protocols  is  "recoverable"  from  loss  or  duplication  of 
packets  by  the  transmission  medium,  and  that  packets  are 
delivered  in  order.  Connection  establishment  and  protocol 
failures  are  not  discussed,  although  the  analysis  technique  may 
be  applicable  to  some  types  of  failures. 

Bochman  (1974)  has  analyzed  some  simple  PAR  protocols 
using  the  "action  sequences"  associated  with  a state  diagram 
model.  He  has  also  explored  an  algorithmic  protocol 
specification  as  a basis  for  both  assertion  proof  techniques  and 
protocol  implementation  by  structured  programming  IBochman75],. 

LeMoii  (1973)  has  proposed  a "colloquy"  model  for 
protocol  specification  consisting  of  a finite  state  machine  with 
clearly  defined  user  interface  on  one  side  and  network 
communication  interface  on  the  other  side.  Danthine  and  Bremer 
(1975a,  1975b)  have  extended  this  model  to  facilitate  simulation 
of  protocols. 

Gilbert  and  Chandler  (1972)  have  treated  the  Interaction 
of  parallel  processes  by  defining  a "composite  state"  Including 
the  state  of  each  process  and  the  values  of  shared  variables. 
Bredt  (1973)  has  extended  this  model  to  allow  infinite  numbers 
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of  processes  or  infinite  values  for  variables.  The  requirement 
fcr  shared  variables  between  proce' ses  prevents  the  direct 
appi  ication  of  these  techniques  to  communication  protocols. 
However,  in  section  5 we  have  modi  fied  this  model  to  allow 
message-based  interprocess  communication  and  have  applied  the 
extended  model  to  verify  a complex  protocol  initialization 
mechan i sm. 

Day  (1975)  has  begun  research  to  determine  the  issues 
involved  in  designing  "res 1 1 lent"  protocol s.  He  suggests  that 
protocol  specification  techniques  are  of  primary  importance,  and 
that  several  approaches  to  verification  may  prove  useful 
including  formal  modeling,  program  proving,  implementation  aids, 
and  implementation  testers  or  exercisers, 

Members  of  IFIP  UG6.1  (INUG)  have  been  active  in 
developing  interprocess  communication  protocols.  Researchers  at 
Stanford  University  and  Bolt  Beranek  and  Newman  have  been 
particularly  interested  In  protocol  reliability  fCerf74b, 
Da  I a 174,  Sunshine74,  Be!  nes74a,  Tomlinson74,  HcKenzie74]. 


L2 — Protocol  Specification  and  Verification  Techniques 

Analysis  and  design  of  communication  protocols  requires 
a clear  protocol  specification.  A good  protocol  specification 
must  ultimately  serve  several  purposes,  including  definition, 
verification,  simulation,  implementation,  and  documentation  of 
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the  algorithms  Involved  [Danthine75b,  Bochman751 . Ue  do  not 
attempt  to  develop  a complete  theory  of  protocol  specification, 
or  of  protocol  verification,  although  both  specification  and 
verification  are  important  in  the  broader  performance  analysis 
uie  seek. 

Ue  have  used  different  protocol  specification  techniques 
for  the  different  performance  topiwi.  treated  in  this  thesis  in 
order  to  most  clearly  define  the  protocol  behaviour  relevant  to 
each  topic.  Sections  3 and  4 employ  a flowchart  or  algorithmic 
protocol  specification.  Section  5 uses  a state  diagram 
specification  consistent  with  the  exchanges  of  control 
information  required  to  initial ize  and  terminate  connections. 
Appendix  A develops  a detailed  protocol  specification  model 
based  on  state  diagrams  with  additional  "context"  information. 

Authors  primarily  Interested  in  verification  have 
employed  formal  models  such  as  Petri  nets  [f1erlin74],  UCLA 
Graphs  [Postel731,  and  state  diagrams  [Bochman74,  LynchBSl . By 
specifying  a protocol  in  terms  of  one  of  these  formal  models, 
the  powerful  general  theory  developed  for  these  abstract  models 
may  be  brought  to  bear  on  a particular  aspect  of  protocol 
verification.  These  techniques  have  succeeded  in  verifying  some 
facets  of  protocol  reliability  assuming  reasonably  well-behaved 
transmission  media. 

Unfortunately,  both  the  complexity  of  more  powerful 
protocols,  and  more  hostile  transmission  media  are  beyond  the 


W 


Introduction 


24 


capabilities  of  current  formal  models.  The  explosion  of  states 
or  nodes  required  to  represent  more  complex  protocols  causes 
some  problems.  Other  difficulties  arise  in  trying  to 
incorporate  transmission  media  aliouing  total  loss  (as  opposed 
to  damage)  of  packets,  large  amounts  of  Internal  storage, 
internal  duplication,  and  out-of-order  delivery  of  packets. 

Ue  have  been  forced  to  abandon  some  of  the  rigor  of 
formal  definitions  and  models  in  order  to  achieve  results  of 
broader  scope.  Our  protocol  specification  techniques  Include 
prose,  algorithms,  flow  charts,  and  state  diagrams.  Our  proof 
techniques  include  decomposition  into  simple  modules,  exhaustive 
or  complete  test  input  sets,  assertion  proving  tFloydB7,  NaurBB, 
HoareBS] , and  the  formal  models  mentioned  above.  An  advantage 
of  the  informal  assertion  techniques  used  throughout  this 
chapter  is  that  in  many  cases  they  can  be  used  to  demonstrate 
protocol  failure  consequences  and  Initialization  requirements  in 
addition  to  correctness  under  normal  operation. 
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2.  PROTOCOL  riECHANISHS  AND  PERFORflANCE  GOALS 

In  this  section  ue  present  an  outline  of  the  mechanisms 
employed  in  typical  positive  acknowledgement,  retransmission 
(PAR)  type  protocols.  The  description  is  purposely  broad  In 
order  to  encompass  a wide  ciass  of  protocols.  Sections  3 and  4 
examine  more  detailed  examples  of  PAR  protocols. 

After  outlining  protocol  mechanisms,  we  define  four 
performance  measures  used  in  later  sections  to  evaluate  protocol 
reliability.  These  measures  relate  to  loss,  duplication,  and 
out-of-order  delivery  of  packets. 


2.1  Protocol  Definition 

A PAR  interprocess  communication  protocol  consists  of  a 
sending  dijcipl  ine,  a receiving  discipl  ine.  and  a transmlsei^n 

for  sending  packets  (messages,  letters,  finite  length  bit 
strings)  between  processes. 

The  sending  discipline  accepts  packets  from  a process, 
attaches  any  control  information  used  by  the  protocol  to  achieve 
reliable  communication,  and  passes  the  packet  to  the 
transmission  medium.  (Submitted  packets  may  be  fragmented  into 
smaller  pieces  before  transmission.) 

Normal  I y the  sending  discipline  will  retransmit  each 
packet  at  intervals  determined  by  a retransmission  timeout 
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parameter  R.  until  a posi tive  acknowledgement  (ACK)  it  received. 
Then  the  process  is  notified  that  the  packet  has  been 
successfully  delivered.  Another  parameter,  the  quit  time  D. 
determines  when  the  sending  discipline  should  give  up  and  report 
possible  failure  to  deliver  a packet. 

The  receiving  discipline  receives  packets  from  the 
transmission  medium  and  uses  the  control  information  to 
eliminate  duplicates,  reassemble  or  reorder  fragments,  and 
deliver  packets  to  the  process  In  order.  Successfully  received 
packets  are  acknowledged.  The  transmission  medium  accepts 
packets  from  the  sending  discipline  and  delivers  them  to  the 
receiving  diocipl ine  subject  to  the  delay,  loss,  duplication, 
ordering,  size,  and  bandwidth  characteristics  discussed  In 
chapter  L 

Since  a PAR  protocol  provides  bi -directional 
communication  between  two  processes,  a sending  and  receiving 
discipline  are  required  on  each  side  of  the  communication  path. 
The  protocol  at  each  side  must  be  Initialized  (control 
information  set  up)  as  discussed  in  section  5 before  reliable 
communication  can  begin. 
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^ 2.2  Performance  Measures 

(1)  DELIVERY:  A protocol  successful  lu  delivarB  if  every 
packet  submitted  by  the  source  process  is  eventually  delivered 
(undamaged)  to  the  destination  process.  A protocol  fails  to 

'vg'*  a packet  if  a packet  submitted  to  the  protocol  Is  not 
delivered  (undamaged)  to  the  destination  process. 

(2)  LOSS:  A protocol  Lfises  a packet  if  it  reports  successful 
delivery  of  a packet  to  the  sending  process  when  in  fact  the 
packet  has  not  been  successfully  Jelivered  to  the  destination 
process.  A protocol  does  not  lose  packets  if  It  reports 
successful  delivery  only  if  the  packet  was  in  fact  successfully 
del i vered. 

(3)  DUPLICATION:  A protocol  dupl i cates  packets  if  a single 
packet  submitted  by  the  source  process  is  delivered  more  than 
once  to  the  destination  process.  A protocol  does  not  duplicate 
packets  if  every  packet  submitted  is  delivered  at  most  once. 
(If  the  process  submits  the  same  message  twice,  both  copies  will 
be  delivered  at  the  other  end— this  Is  not  duplication.) 

(4)  ORDERING;  A protocol  delivers  packets  in  order  If  packets 
are  delivered  to  the  destination  process  in  exactly  the  same 
order  that  they  were  submitted  by  the  source  process.  A 
protocol  delivers  packets  put  of  order  if  packets  are  delivered 
in  a different  order  than  they  were  submitted. 
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In  this  section  ue  consider  a ciass  of  simple  PAR 
protocols  without  sequencing,  fragmentation,  fiow  controi,  or 
connection  establishment.  In  particular  we  note  that  this  ciass 
of  protocois  provides  no  mechanism  for  sequencing  packets  that 
may  arrive  out  of  order. 

First  we  define  this  class  of  protocols  using  the 
generai  outiine  of  a sending  discipline  and  a receiving 
discipilne  presented  in  section  2: 


SENDING  DISCIPLINE  (see  figure  2): 

Each  packet  submitted  by  the  source  process  is  assigned  a 
unique  Identifier.  (I4e  temporarily  Ignore  the  problems  of 
an  infinite  ID  space.)  The  packet  Is  transmitted,  and  a 
copy  Is  retained. 

Arriving  ACK’s  are  checked  for  errors,  and  damaged  ones 
d i scarded. 

Uhen  an  ACK  referencing  this  identifier  Is  received,  the 
retained  copy  Is  discarded  (and  the  source  process  notified 
of  success).  If  no  ACK  is  received  within  the 
retransmission  timeout  period  R,  the  copy  is  again 
transmitted  and  the  cycle  r^eated.  If  the  quit  time  has 
been  exceeded,  retransmission  is  suspended  (and  the  sendinq 
process  notified).  ® 

ACK’s  for  discarded  packets  are  Ignored. 


RECEIVING  DISCIPLINE  (see  figure  3): 

Each  packet  received  from  the  transmission  medium  is  checked 
for  errors  and  discarded  if  damaged. 

If  not  damaged,  the  packet’s  ID  is  added  to  the  list  of 
received-packet  ID’s  and  an  ACK  referencing  the  identifier 
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FIGURE  2 PAR  PROTOCOL  SENDING  DISCIPLINE 
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FIGURE  3 PAR  PROTOCOL  RECEIVING  DISCIPLINE 
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Is  transmitted.  (Ue  temporarily  Ignore  the  fact  that  the 
received-packet  list  size  increases  as  more  packets  are 
received.)  If  the  packet’s  identifier  Is  alrea^  in  the 
list,  the  packet  is  discarded  as  a duplicate.  Otherulse  the 
packet  is  delivered  to  the  process. 

TRANSMISSION  MEDIUM: 

Characterized  by  such  parameters  as  delay,  maximum  packet 
lifetime  in  medium,  banduidth,  (non  unity)  loss  probability, 
and  (non  unity)  damage  probability.  The  complications  of 
addressing,  routing,  and  multiplexing  many  connections  over 
a single  path  are  ignored  here — the  protocol  Is  defined  for 
a single  connection. 

The  protocol  Is  initial  ized  when  both  sides  have  empty 
received-packet  lists  and  no  packets  have  been  sent.  (Hou  to 
re' i ably  accomplish  this  Is  discussed  in  section  5.) 

The  aijove  protocol  definition  assumes  that  all  damaged 
packets  and  acknouledgements  tii  1 1 be  detected.  In  fact  it  is 
not  possible  to  detect  all  transmission  errors,  resulting  in- 
occasional  acceptance  of  a faulty  packet  or  ACK.  Houever,  the 
probability  of  an  undetected  error  can  be  made  extremely  small 
at  modest  cost  by  use  of  uel  I knoun  coding  techniques,  and  ue 
Mill  continue  to  assume  perfect  error  detection. 

Although  the  abovo  protocol  definition  Is  quite 
specific,  it  still  serves  to  define  a class  of  protocols 
equivalent  for  purposes  of  reliability  analysis.  Different 
mechanisms  for  unique  identifier  selection,  for  example,  or  even 
additional  protocol  mechanisms  such  as  negative  acknouledgements 


to  stimulate  retransmission  are  included  In  this  class  of 
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protocols.  These  differences  may  have  important  effects  on 
efficiency  or  cost  of  implementation,  but  do  not  alter  the 
rel  iabi I i ty  of  theprotocol. 

Having  specified  a class  of  simple  PAR  protocols,  ue  nou 
show  that  this  class  satisfies  several  of  the  reliabilitu 
performance  goals  defined  in  section  2. 

THfl  1:  A correctly  functioning  PAR  protocol  with  infinite  quit 
time  never  loses,  duplicates,  or  fails  to  deliver  packets. 

THfl  lA;  A correctly  functioning  PAR  protocol  with  finite  quit 
time  never  loses  or  duplicates  packets,  and  the  probability  of 
failing  to  del  iver  a packet  can  be  made  arbitrari  ly  smal  I by  the 
sender. 

PROOF; 

DUPLICATION; 

No  duplicate  packet  generated  by  the  sending 
discipline  or  transmission  medium  will  ever  be  delivered  to 
the  process,  because  in  checking  the  list  of  received-packet 
ID’s,  the  receiving  discipline  will  discard  them. 

LOSS  AND  FAILURE  TO  DELIVER; 

There  is  a nonzero  probability  that  the  transmission 
medium  will  successfully  transmit  a packet.  Hence  an 
infinite  quit  time  implies  eventual  successful  delivery  with 
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probability  one.  (However  this  may  take  a long  time  If  the 
transmission  eediuM  i;  highly  unreliable!) 

For  finite  quit  times,  the  time  may  be  exceeded 
before  successful  transmission.  However,  the  process  Is 
notified  that  the  packet  may  not  have  been  delivered  (it 
also  may  have  been  delivered  if  the  ACIC’s  are  lost),  and  can 
command  the  protocol  to  reset  the  quit  time  and  continue,  or 
give  up.  The  protocol  never  reports  successful  delivery 
falsely,  and  the  process  can  make  the  probability  of  failure 
to  deliver  arbitrarily  small  by  increasing  the  quit  time. 

Ue  now  examine  the  consequences  of  protocol  failures  In 
either  the  receiving  discipline  or  the  sending  discipline.  This 
analysis  was  suggested  In  an  informal  note  by  Belsnes  (1974a). 

THH  Z:  A PAR  protocol  that  is  functioning  incorrectly  because 
the  received-packet  ID  list  is  lost  (receiver  crashes  and 
restarts)  will  either  lose  packets,  generate  duplicate  packets, 
or  fail  to  deliver  packets,  and  the  failure  probability  cannot 
be  made  arbitrarily  small  by  the  sender. 

PROOF;  Suppose  the  protocol  was  Initially  functioning 
correctly.  Let  side  A be  sending  packet  X to  side  P. 


Suppose  that  when  B fails,  it  loses  Its  received-packet  ID 
list,  but  then  continues  to  function  normally.  Suppose  the 
original  transmission  of  X arrived  Intact  at  B and  was 
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delivered,  but  the  ACK  was  damag'dd  or  delayed.  Then  B 
fails,  clearing  its  received-packet  ID  list.  A 
retransmission  of  X then  arrives,  and  is  not  detected  as  a 
duplicate,  hence  is  delivered  to  the  process. 

Alternatively,  suppose  that  when  B receives  any  packet  from 
A after  failing,  it  notifies  A of  the  failure,  and  rejects 
any  packets  until  the  protocol  is  ini t iai ized  again.  In 
this  case  A reinitializes  the  protocol  (by  some  foolproof 
means  beyond  the  scope  of  this  analysis).  But  then  A must 
decide  what  to  do  about  X: 

If  A sends  X,  it  may  be  a duplicate  as  above. 

If  A doesn’t  fend  X,  and  reports  success,  the  packet  may 
be  lost  (i  B failed  before  receiving  a good  copy  of  X), 

If  A notifies  the  process  of  the  failure  and  the 
uncertain  fate  of  X,  the  process  has  the  same 
possibilities  for  failure: 

Continue  trying  to  send  X which  may  result  in  a 
duplicate  as  above.  (This  couldn’t  happen  in  THfl 
1.) 


Give  up  which  may  be  a failure  to  deliver  X. 
Furthermore,  the  sender  cannot  make  the  probabi  I i ty 
of  failure  to  deliver  arbitrarily  small  by  changing 
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parameters  available  to  him,  since  this  failure 
depends  on  the  reliability  of  the  receiver. 

THfl  3:  A PAR  protocol  that  is  functioning  incorrectly  because 
tho  sending  discipline  loses  track  of  ID’s  used  or  packets 
pending  (sender  crashes  and  restarts),  Mill  either  lose  packets, 
fail  to  deliver  packets,  or  force  the  sending  process  to 
dupl icate  packets. 

PROOF: 

LOSS;  If  the  sender  loses  track  of  ID’s,  and  reuses  an  ID 
for  a neu  packet,  the  receiver  Mill  ACK  it  but  discard  the 
packet  as  a duplicate.  HoMever,  the  sender  ulll  receive  the 
ACK  and  report  successful  delivery. 


FAILURE  TO  DELIVER;  If  the  sending  discipline  loses  packets 
that  have  been  transmitted,  but  not  yet  acknouledged,  it 
ceases  to  retransmit  them,  and  they  are  not  delivered. 
Furthermore  the  process  may  not  even  be  notified  of  the 
fai I ure. 

DUPLICATION:  If  the  sending  process  tries  to  recover  from 
the  absence  of  either  a success  or  quit  notification  from 
the  protocol  by  resending  a packet,  the  packet  may  have 
already  been  delivered  before  the  failure. 
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Theorems  1-3  demonstrate  the  fundamental  limitation^  pf 
P^,  protocols;  they  successfully  mask  errors  in  the  transmission 
medium,  but  they  cannot  guarantee  re  I i able  transmission  uhen 

part  of  the  protocol  itself  is  violated  due  to  failure  of  one 
side  or  the  other.  The  information  maintained  at  both  sides  of 
the  protocol  is  necessary  for  correct  functioning. 

Many  protocol  designers  persist  in  trying  to  get  around 
this  fundamental  limitation  and  "invisibly"  recover  from 
fai  lures  by  introducing  more  complicated  control  mechanisms, 
usually  involving  reini tial izing  the  connection  tMader74] , The 
issue  of  (re)initial  izing  a connection  for  reliable  transmission 
after  a failure  (cf  section  5)  is  separable  from  the  issue  of 
reliability  within  a connection.  Theorems  2 and  3 show  that 
given  certain  types  of  failure,  there  can  be  no  guaranteed 
reliability  with  PAR  type  communication  protocols. 

Those  desiring  greater  reliability  may  implement  failure 
recovery  schemes  at  a higher  (process)  level  (where  they  meet 
the  same  problems),  or  reduce  the  possibility  of  protocol 
failure  with  self  checking  or  redundant  machines,  backup  stores, 
checkpointing,  or  other  means. 
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4.  PAR  PROTOCOL  UITH  SEQUENCING 

The  basic  PAR  protocol  above  does  not  concern  Itself 

I 

uith  sequencing.  Uhen  the  characteristics  of  the  transmission 
medium  include  out-of-order  delivery  (frequently  the  case  In 
packet  su itching  nets),  the  basic  PAR  protocol  must  be  augmented 
uith  a sequencing  mechanism  if  correctly  sequenced  interprocess 
communication  is  desired.  This  section  incorporates  such  a 
mechanism  into  a PAR  protocol,  resulting  in  a Sequencing  PAR  or 
SPAR  protocol.  Del  ivering  packets  in  order  is  nou  included  in 
the  protocol  performance  requirements. 

Sequencing  is  normally  achieved  by  including  a sequence 
number  (SN)  in  the  control  Information  attached  to  each  packet 
by  the  sending  discipline.  The  receiving  discipline  uses  SN  to 
determine  the  correct  order  of  arriving  packets.  First  ue 
describe  a SPAR  protocol  using  both  sequence  number  and  unique 
identifier  (cf  section  3)  fields  in  each  packet.  Ue  shou  that 
the  sequence  number  may  also  serve  as  a unique  identifier, 
eliminating  the  need  for  a separate  packet  ID  field.  Ue  than 
define  a class  of  simplified  SPAR  protocols  and  analyze  its 
reliability  as  in  section  3. 

DBF ; A Sequencing  Positive  Acknowledgement.  Retransmission 

(SPAR)  protocol  is  a PAR  protocol  with  the  follouing  additions: 
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SENDING  DISCIPLINE:  The  sending  discipline  maintains  a 
sequence  number  (SN).  Each  packet  submitted  by  the  process 
has  SN  attached  (along  with  ID),  and  then  SN  is  incremented. 

RECEIVING  DISCIPLINE:  The  receiving  discipline  maintains  an 
expected  sequence  number  (ESN).  After  discarding  damaged 
packets,  the  packet’s  ID  and  SN  determine  the  action  to  be 
taken  according  to  Table  1. 


Table  1 


Processing  of 

Received  Packets  in 

SPAR  Protocol 

lower 

packet  SN  : ESN 
equal 

h i gher 

ID  new 

XXX 

ACK,  deliver  to 
process,  INC,  ENTER 

discard  as 
out  of  order 

ID  old 

ACK,  discard 

XXX 

XXX 

ACK  means  transmit  an  ACK  referencing  ID; 
INC  means  increment  ESN; 

ENTER  means  enter  the  packet’s  ID  in  the 
received-packet  ID  list; 

XXX  means  this  case  does  not  occur. 


The  protocol  is  initial ized  when  SN  and  ESN  are  equal  to 
each  other  (may  be  different  in  the  two  directions),  no  packets 
have  been  sent,  and  both  sides  have  empty  received-packet  ID 


I i sts. 
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From  Table  1 ue  see  that  the  sequence  number  and 
identifier  fields  In  a packet  maintain  redundant 
information— they  are  both  duplicate  suppressors.  In 
particular,  the  receiving  di scipl ine  never  needs  to  check  the 
received-packet  ID  list  to  detect  duplicates,  because  the  ESN 
screening  accomplishes  this.  Since  it  is  easier  to  remember  a 
single  ESN  than  a potentially  infinite  list  of  ID’s,  the  ID  can 
be  dropped  entirely  from  the  SPAR  protocol , uith  the  sequence 
number  performing  both  the  dup I i cate  detection  and  sequencing 
functions.  The  resulting  simpler  SPAR  is  specified  as  follows: 


SENDING  DISCIPLINE  (see  figure  4): 

The  sending  discipl ine  maintains  a sequence  number  (SN). 
Each  packet  submitted  by  the  process  has  SN  attached,  and 
then  SN  is  advanced  to  its  successor,  (1)  The  packet  is 
transmitted,  and  a copy  retained. 

Arriving  ACK’s  are  checked  for  errors,  and  dv,-,aged  ones 
discarded. 

When  an  ACK  referencing  this  packet’s  sequence  number  is 
received,  the  retained  copy  is  discarded  (and  the  sending 
process  notified  of  success).  If  no  ACK  is  received  within 
the  retransmission  timeout  period  R,  the  copy  is 
retransmitted  and  the  cycle  repeated.  If  the  quit  time  has 
been  exceeded,  retransmission  is  suspended  (and  the  sending 
process  notified). 

ACK’s  for  discarded  packets  are  ignored. 


(1)  The  simplest  and  most  widely  used  successor  function  is  to 
increment  by  one,  although  more  complex  successor  relations  have 
been  used  to  support  priority  ( I flP- IMP  protocol  IttcQu i I I an721 ) 
or  fragmentation  and  reassembly  (Cerf74bl . 


FIGURE  4 SPAR  PROTOCOL  SENDING  DISCIPUNE 


FIGURE  5 SPAR  PROTOCOL  RECEIVING  DISCIPLINE 
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RECEIVING  DISCIPLINE  (see  figure  5); 

The  receiving  discipline  maintains  an  expected  sequence 
number  (ESN). 

Each  packet  received  is  checked  for  errors,  and  discarded  if 
damaged. 

If  not  damaged,  the  packet’s  sequence  number  (SN)  is 
compared  ul  th  ESN.  The  *-eceiving  discipline  operates  as 
follows  on  the  basis  of  this  comparison: 

If  less,  transmit  an  ACK  referencing  the  packet’s  SN  and 
discard  the  packet  as  a duplicate. 

If  equal,  transmit  an  ACk,  deliver  the  packet  to  the 
process,  and  advance  ESN  to  its  successor. 

If  greater,  discard  the  packet  as  out  of  order.  (1) 


The  protocol  is  initialized  when  SN  and  ESN  are  equal  to 
each  other  in  both  directions  (see  section  5)  and  no  packets 
have  been  sent. 


Theorems  1-3  carry  over  straightforwardly  to  SPAR  protocols. 


THM  IB:  A correctly  functioning  SPAR  protocol  with  infinite  quit 
time  never  loses  packets,  duplicates  packets,  fails  to  deliver 
packets,  or  delivers  packets  out  of  order. 


PROOF:  The  first  three  parts  are  proved  as  in  theorem  1 with 
the  sequence  number  acting  as  ID.  If  a packet  ever  arrives 
at  the  receiving  discipline  before  one  of  its  predecessors, 


(I''  For  greater  efficiency,  the  receiving  discipline  may  choose 
to  keep  some  number  of  out  of  order  packets  for  a time.  The 
costs  and  benefits  of  such  schemes  will  be  discussed  in  char>t«r 
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the  ESN  check  Mill  cause  It  to  be  discarded.  Only  the  next 
packet  in  order  can  be  delivered  to  the  process. 

THM  2A;  A SPAR  protocol  that  is  functioning  incorrectly  because 
the  receiving  discipline  loses  ESN  (receiver  crashes  and 
restarts),  uill  either  lose  packets,  duplicate  packets,  or  fall 
to  deli ver  packets. 


PROOF;  Same  as  theorem  2 with  ESN  taking  the  place  of  ID. 


THM  2B:  A malfunctioning  SPAR  protocol  where  ESN  and  SN  become 
desynchronized  may  completely  fail  to  deliver  packets. 

PROOF:  Desynchronization  may  occur  if  either  the  sending  or 
receiving  discipline  fails  to  maintain  SN  or  ESN  correctly. 
If  ESN  winds  up  below  or  above  the  sequence  number  of  all 
outstanding  packets  (outside  the  "window"  of  expected 
sequence  numbers  described  in  (Cerf74b]),  the  "expected" 
sequence  number  will  never  appear  at  the  receiving 
discipline,  and  no  packet  will  be  accepted.  Recovering  from 
such  deadlocks  requires  resynchronizing  the  protocoi  as 
discussed  in  section  5. 

Even  if  SN  and  ESN  are  lost  or  misset,  a 
malfunctioning  SPAR  will  not  deliver  packets  out  of  order  as 
long  as  the  ESN  screening  in  the  receiving  discipline  is 
obeyed.  (However,  a series  of  in-order  duplicates  may  be 
delivered  as  in  THH  2A.) 


*- 


I 


I 
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Theorem  IB  shows  that  SPAR  protocols  provide  the  desired 
reliability  characteristics  when  functioning  correctly.  Linder 
protocol  failures,  however,  invisible  error-free  recovery  is 
again  impossible  to  guarantee,  and  SPAR  protocol  failures  may 
even  result  in  total  deadlocks  of  the  communication  path. 

Both  the  infinite  sequence  number  space  assumed  in  this 
section  for  SPAR  protocols,  and  the  infinite  identifier  space 
assumed  for  PAR  protocols  in  section  3 are  impossible  in 
practice.  For  SPAR  protocols,  a finite  sequence  number  space 
places  constraints  on  the  volume  of  traffic  transmitted.  I f the 
maximum  packet  lifetime  is  L,  no  sequence  number  can  be  reused 
for  time  L,  limiting  the  rate  of  transmission.  If  the  size  of 
the  sequence  number  space  is  N,  Cerf  and  Kahn  (1974c)  have  shown 
that  at  most  N/2  packets  can  be  outstanding  (transmitteo  but  not 
yet  acknowledged)  at  any  time.  A suitable  modulo  N successor 
function  and  comparison  operations  are  also  required. 

If  these  constraints  are  violated,  "old"  packets  with 
acceptable  sequence  numbers  may  appear  at  the  receiving 
discipline  and  be  accepted  instead  of  the  current  packet  with 
the  same  sequence  number  (cf  section  5.2).  These  constraints 
must  be  included  in  the  protocol  specification  in  order  to 
assure  reliable  operation. 

Similar  constraints  on  the  reuse  of  packet  identifiers 
by  the  sending  discipline  apply  to  (nonsequencing)  PAR 
protocols.  Maintenance  of  the  recei ved-pacKet  ID  list  by  the 
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receiving  discipline  presents  other  difficulties.  Received 
identifiers  must  be  removed  from  the  list  after  time  L so  the 
next  use  of  the  ID  uill  be  accepted.  This  may  adequately  reduce 
the  size  of  the  list  uith  lou  transmission  rates  or  small  L. 
Further  reductions  may  be  accomplished  by  assigning  identifiers 
sequentially  so  that  remembering  a single  ID  can  represent  the 
fact  that  all  previous  ID*s  have  been  received.  Only  the 
relatively  small  number  of  noncontiguous  ID's  must  be  remembered 
individually.  Sequencing  also  provides  a simple  means  of 
generating  unique  identifiers  at  the  sending  discipline.  Hence 
sequence  numbers  provide  the  cheapest  uay  to  keep  track  of 
packets  already  sent  or  received,  even  if  the  sequencing 
information  is  not  used  to  deliver  the  packets  in  order.  Pouzin 
(1974c)  has  described  a combination  bit  map  and  sequencing 
mechanism  to  further  reduce  storage  requirements. 

1 
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5.  CONMECTION  ESTABLISHMENT 

Sections  3 and  4 have  focused  on  the  operation  of  a 
communication  protocol  after  the  protocol  is  initialized.  The 
analysis  considered  a single  conversation  between  two  processes 
desiring  to  communicate  with  each  other.  This  section  examines 
the  additional  issues  involved  in  beginning  and  ending  a 
conversat i on. 

After  clarifying  the  concept  of  a connection  between 
processes  for  re  I i able  communication,  we  discuss  the  actions 
required  to  establish  a connection  and  show  that  some  simple 
mechanisms  proposed  for  this  purpose  are  inadequate  with  a 
hostile  transmission  medium.  Ue  present  more  robust  connection 
establishment  mechanisms  and  demonstrate  their  correctness  under 
normal  operation  and  the  consequences  of  various  failures. 
Appendix  A develops  a state  diagram  model  for  representing 
connection  establishment  procedures  which  is  used  to  analyze 
both  simple  and  robust  establishment  mechanisms. 

The  need  to  consider  explicitly  star  t i ng  and  end  i ng 
conversations  arises  for  several  reasons  tPouzin75]; 

(1)  In  order  to  function  correctly,  the  protocol  must  be 
initialized  before  a conversation  begins. 

(2)  In  reality,  many  processes  will  want  to  communicate  with 
many  other  processes.  If  there  are  N 


processes,  there  are 
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N(N-l)/2  possible  conversations  (assuming  no  one  talke  to 
himself),  but  the  nunber  of  conversations  actually  active  at 
any  moment  ui  1 1 generally  be  far  smaller.  Without  a 
mechanism  for  starting  and  ending  conversations  on  demand, 
the  state  of  all  possible  conversations  must  be  maintained 
perpetually  at  an  impossible  cost  for  even  a moderate  number 
of  processes. 

(3)  In  the  case  of  certain  protocol  failures  (Host  crashes), 
the  protocol  must  be  reini tial iazed  to  allow  reliable 
comuni  cat  ion  to  proceed  from  the  time  of  failure  (see 
theorems  2-3). 

(4)  Processes  may  wish  to  make  themselves  available  for 
commun i caat i on  at  some  times,  and  refuse  conversation  at 
other  times. 

5.1  Connection  Definition 

The  notion  of  a conversation  can  be  formalized  as 
^ sopncction  is  a bi-directional  communication 
mechanism  between  two  processes.  A connection  ie  uniquely 
specified  by  a pair  of  processes.  That  is,  once  the  idea  of 
multiple  connections  between  various  processes  is  introduced, 
the  communication  protocol  must  provide  a means  for  identifying 
processes  and  hence  connections.  These  process  ID’s  are  called 
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addresses,  and  a connection  is  specified  by  a pair  of  addresses. 
He  denote  connections  by  address  pairs  in  angle  brackets, 
<address,  address>. 

To  start  a conversation,  two  processes  OPEN  a 
connection,  and  to  end  the  conversation  they  CLOSE  the 
connection.  This  leads  to  the  specification  of  states  of  a 
connect  ion: 

ESTABLISHED:  when  the  protocol  has  been  iniMalized  and  the 
processes  are  free  to  exchange  packets. 

NOT  ACTIVE:  when  the  protocol  is  not  initialized  and  the 
processes  do  not  intend  to  communicate.  A minimum  of  state 
information  about  the  connection  is  maintained. 

The  communication  protocol  attempts  to  establish  a 
connection  upon  a proctss’s  request  to  OPEN  a connection,  and  to 
terminate  the  connection  on  the  process’s  command  to  CLOSE.  An 
incarnation  of  a connection  is  the  time  from  the  establishment 
to  the  closing  of  the  connection.  A connection  <A,B>  may  go 
through  many  incarnations  as  processes  A and  B open  and  close  a 
communication  path  over  time. 

Without  fully  specifying  the  details,  we  name  the  new 
class  of  protocol  that  includes  a mechanism  for  opening  and 
cirsing  connections  a Communication  Control  Protocol  (CCP).  A 
CCP  is  a SPAR  protocol  with  the  additional  mechanisms  necessary 
to  reliably  initialize  and  terminate  the  protocol. 


Connection  Establishment 

To  move  a connection  *rom  the  Not  Active  ata^e  to  the 
Estab I i shed  state,  the  protocol  must  be  initialized,  and  the 
connection  may  spend  some  time  in  an  intermediate  state  called 
OPENING.  In  going  from  the  Established  state  to  the  Not  Active 
state,  the  protocol  should  terminate  communication  in  an  orderly 
fashion  (perhaps  wait  for  outstanding  packets  to  be  received  or 
acknowledged),  and  the  connection  may  spend  some  time  in  an 
intermediate  state  CLOSING.  (See  figure  B) 

It  is  important  to  note  that  the  protocol  disciplines  on 
the  two  sides  of  the  connection  may  think  that  the  connection  is 
in  different  states.  The  full  state  of  a connection  is 
specified  by  a pair  of  states,  one  for  each  side.  Ue  denote 
connection  states  by  state  pairs  in  angle  brackets,  <state, 
state>. 

The  correct  functioning  of  a protocol  can  now  be 
considered  in  terms  of  the  state  transitions.  Each  of  the  major 
states  above  may  have  a substructure  of  more  detailed  states. 
For  example,  the  exchange  of  data  packets  described  in  section  4 
occurs  with  both  processes  in  the  Established  state.  The 
analysis  of  possible  transitions  and  determination  of 
undesirable  states  is  an  extremely  useful  technique  for  protocol 
analysis  as  ue  shall  see  later  in  this  section.  But  first  we 
examine  the  means  for  opening  and  closing  a connection. 

In  the  simplest  system,  connections  might  be  opened  and 
closed  by  some  means  external  to  the  communication  system.  For 
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example  users  might  call  each  other  up,  or  physically  move  from 
one  place  to  another  and  instruct  the  CCP  to  Initialize  o- 
terminate  a connection.  Such  systems  ui  I I be  called  external  lu 
control  led.  The  coordination  of  the  two  sides  Is  enforced 
externally  by  s.-me  higher  authority,  subject  to  Its  own 
validation  problems. 

However,  external  control  is  frequently  not  possible  or 
desirable.  The  most  Interesting  and  useful  systems  use  the 
transmission  medium  itself  to  control  connections  as  well  as  to 
communicate  processes’  data.  To  this  end,  CCP’s  exchange 
control  packets.  Only  such  internal lu  controlled  systems  will 
be  considered  further,  although  the  pi tfal Is  discussed  below 
apply  to  externally  controlled  systems  as  well. 

5.2  Opening  a Connection 

Suppose  for  concreteness  that  processes  A and  B wish  to 
open  a connection.  The  primary  task  In  opening  the  connection 
<A,B>  is  is  to  initialize  the  protocol.  Each  CCP  has  SN  in  the 
sending  discipline,  and  ESN  in  the  receiving  dl scipl ine  as 
described  in  section  4. 

2^^ESN(A)  must  be  set  to  ISN(B)  and  ESN(B)  must  be  set  to 
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5.2.1  Selecting  ISN 

The  conditions  for  initialization  of  SPAR  protocols 
required  that  there  be  no  packets  exchanged  between  A and  B. 
This  may  not  be  true  if  A and  B have  been  previously  connected, 
and  in  fact  packets  from  the  previous  incarnation  of  a 
connection  may  emerge,  due  to  delays  in  the  transmission  medium 
and  out-rf-order  delivery,  during  the  current  incarnation.  The 
sequencing  mechanism  defined  for  SPAR  protocols  successfully 
handles  duplicates  within  a single  connection,  but  cannot  in  its 
simple  form  reliably  manage  opening  and  closing  connections.  In 
particular  if  ISN  is  picked  for  the  new  incarnation  so  that  some 
sequence  numbers  from  an  old  incarnation  are  reused,  errors  may 
occur. 

THM  4:  A CCP  that  transmits  packets  undi  f ferent  iable  as  to 
connection  incarnation  (by  reusing  sequence  numbers)  will  lose 
packets,  duplicate  packets,  and  deliver  packets  out  of  order, 

PROOF:  (see  figure  7)  Suppose  packet  X from  an  old 
incarnation  of  connection  <A,B>  and  packet  Y from  the 
current  incarnation  of  <A,B>  are  assigned  the  same  seauence 
number  by  A and  transmitted  to  B.  Furthermore  suppose  X was 
retransmitted  during  the  old  incarnation.  If  the 


53 


FIGURE  7 ERROR  DUE  TO  REUSE  OF  SEQUENCE  NUT^BERS 
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(SEQ  N0.2HDEF)* 
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(retransmitted)  packet  X arrives  at  B before  Y,  it  may  be 
accepted  and  acknowledged  in  place  of  Y,  and  packet  Y will 
be  discarded  as  a dup I i cate.  A wi  1 1 receive  the  ACK  and 
think  Y was  successfully  transmitted.  Message  X is 
duplicated  and  delivered  out  of  order,  while  message  Y is 
lost. 

This  failure  occurs  because: 

(1)  The  transmission  medium  can  delay  or  store  (retransmitted) 
packets  so  they  reach  their  destination  out-of-order  during 
a later  incarnation, 

(2)  Messages  from  old  incarnations  may  not  be  distinguished  from 
packets  of  the  current  incarnation. 

Accordingly,  there  are  two  types  of  solution  to  the  problem; 

(1)  Suppose  there  is  a maximum  time  L that  a packet  can  be 
stored  in  the  transmission  medium  (see  chapter  I).  Then  if  no 
connection  is  opened  before  timeL  after  its  last  closing,  all 

old  packets  will  be  gone,  and  any  ISN  may  be  used  to  initialize 
the  connection. 

This  solution  requires  CCP’s  to  remember  for  time  L that 
a connection  was  closed,  and  hence  runs  counter  to  the  goal  of 
minimizing  state  information  maintained  for  Not  Active 
connections.  Furthermore,  if  a Host  fails  and  forgets  which 
connections  were  recently  closed,  it  must  prevent  opening 
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connections  for  ALL  its  processes  for  time  L since  any  of  them 
might  have  had  recently  closed  connections.  The  cost  of  this 
type  solution  is  then  storage  of  status  information,  and  delay 
in  reestablishing  connections  after  failures.  When  L is  large, 
these  costs  may  be  high.  A recent  proposal  to  CCITT  for  an 
international  standard  of  30  seconds  for  L makes  this  approach 
more  attractive  [INUG751. 

(2)  Be  sure  packets  from  the  current  incarnation  can  be 
distinguished  from  old  packets.  Uays  to  achieve  this  second 
type  of  solution  include: 

(a)  Set  ISN  to  the  last  sequence  number  from  the  previous 
connection.  This  also  violates  the  absence  of  state 
information  for  inactive  connections  because  the  last 
sequence  number  used  must  be  remembered  for  time  L on  every 
connection.  Once  time  L has  passed,  any  value  for  ISN  may 
be  used.  If  a Host  fails,  al  I connections  must  wait  time  L 
as  in  type  1 solutions. 

(b)  Set  ISN  from  a single  clock  for  all  ronnections  at  a 
Host  [TomI  inson741 . The  clock  value  is  the  only  state 
information  that  must  be  preserved  through  inactive 
connections  and  host  crashes.  This  scheme  requires 
resetting  the  sequence  number  (resynch)  if  the  clock  cycles 
around  to  where  the  sequence  number  is.  The  time  until 
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resynch  is  required  is  determined  by  the  sequence  number 
field  size,  clock  rate,  and  connect  I on  traffic  intensity 
[Dal a 1743  . An  additional  cost  of  this  mechanism  is  these 
resynch  tests. 

(c)  Add  more  identifying  information  to  each  packet  so 
otherwise  identical  sequence  numbers  can  be  distinguished. 
This  requires  keeping  an  "incarnation  number"  for  each 
connection,  or  possibly  a global  single  ID  which  is  assigned 
to  each  new  incarnation,  and  then  incremented.  If  the  ID 
has  cycle  time  greater  than  L,  no  confusion  is  possible. 
For  the  single  global  ID,  only  a single  number  need  be 
remembered  for  all  connections  as  in  (b).  Another  field  on 
every  packet  sent  is-  required,  increasing  overhead. 

In  general,  all  solutions  of  type  2 may  fail  if  the 
state  information  which  distinguishes  previous  incarnations  is 
lost.  In  this  case  the  CCP  must  resort  to  a type  1 solutions  as 
shown  in  theorem  5 below.  To  reduce  the  likelihood  of  failure, 
the  state  information  can  be  reduced  to  a minimum  and  maintained 

by  some  specially  reliable  mechanism  like  an  independent  clock 
or  counter. 

THn  5:  A CCP  with  finite  maximum  packet  lifetime  L that  fails  by 
forgetting  the  state  of  connections  must  eithar  inhibit  all 
transmission  for  time  L after  the  failure,  or  wil  I lose  packets. 
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duplicate  packets,  deliver  packets  out  of  order,  and  fail  to 
correctly  initialize  connections. 

PROOF:  When  the  CCP  forgets  the  state  of  a connection,  it 
loses  whatever  state  Information  is  used  to  differentiate 
packets  from  different  Incarnations  of  the  connection  as 
described  above.  Then  It  may  restart  by  resetting  this 
state  information  to  a value  used  earlier,  introducing 
packets  in  the  current  Incarnation  that  are  undi f ferent I ab I e 
from  packets  of  a past  Incarnation.  Then  by  theorem  4, 
packets  may  be  duplicated  or  delivered  out  of  order.  In 
particular,  the  control  packets  causing  initialization 
(discussed  in  the  next  section)  may  be  lost  or  delivered  out 
of  order,  causing  Incorrect  initialization.  To  avoid  these 
problems,  the  CCP  must  wait  time  L after  a failure  before 
transmitting  any  packets. 

Theorems  4 and  5 extend  the  results  of  theorems  2-3  to 
CCP’ s.  Loss  of  state  information  allows  new  packets  to  be 
transmitted  on  a connection  when  it  is  still  possible  for  old 
(retransmitted)  packets  that  look  the  same  to  arrive  at  the 
receiving  disc ipl ine  and  be  accepted  Instead.  In  practice  a 
combination  of  minimizing  the  possibi  I I ty  of  state  information 
loss  and  waiting  some  time  after  restarts  may  reduce  the 
probability  of  confusion  to  an  acceptably  low  level. 
Transmission  media  that  guarantee  in-order  delivery  avoid  this 
prob  I eni. 
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Confusing  packets  from  different  incarnations  of  a 
connection  is  not  as  unliKely  a problem  as  might  be  supposed. 
Many  protocols  use  zero  for  the  Initial  sequence  number  every 
time  a connection  is  established.  In  these  protocols  it  is 
quite  possible  to  open  a connection,  close  it,  and  reopen  it 
within  a maximum  packet  lifetime  L.  In  this  event  it  is  quite 
likely  that  retransmissions  from  a previous  incarnation  will 
emerge  with  correct  sequence  numbers  to  be  accepted  during  the 
current  incarnation. 

The  worst  difficulties  occur  when  the  control  packet (s) 
that  initialize  a connection  get  contused.  Then  one  or  both 
CCP’s  may  think  the  connection  is  established,  but  SN  is  not 
equal  to  ESN  and  no  packets  can  be  successfully  transmitted.  A 
deadlock  occurs  which  must  be  broken  by  further  control  message 
exchanges  as  described  below,  or  by  some  external  means. 


Once  ISN  is  selected  for  a new  incarnation  of  a 
connection,  ESN  must  be  set  equal  to  ISN  in  both  directions.  To 
accomplish  this,  each  CCP  may  try  to  keep  the  sarra  state 
information  that  the  other  CCP  uses  to  select  ISN.  This  is  not 
always  possible,  and  where  It  ’s  possible,  requires  a lot  of 
work  to  synchronize  all  clocks,  remember  incarnation  numbers  for 
all  Hosts,  or  remember  old  sequence  numbers  for  all  old 
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connections.  And  all  this  inforiration  must  be  set  somehow  in 
the  "beginning"  or  after  a failure. 

This  suggests  that  to  establish  a connection,  the 
sending  discipline  should  transmit  a sunchronizat ion  r9ntml 
^SYN)  to  the  receiving  discipline  giving  the  value  of  ISN 
(see  figure  8).  The  receiving  discipline  can  set  ESN  to  this 
value  without  maintaining  any  state  information  about  its 
partner  CCP.  The  receiving  CCP  returns  a SYN  giving  its  own 
ISN,  or  can  reject  any  SYN  that  arrives  when  the  protocol  Is  in 
an  inappropriate  state  (the  only  appropriate  state  is  the 
Opening  state  where  the  process  has  signified  its  readiness  to 
converse,  but  the  connection  is  not  yet  established.) 
Inappropriately  timed  arrivals  are  either  old  retransmissions, 
protocol  errors,  or  attempts  to  establish  a conversation  with  an 
unwi I I ing  partner. 

Unfortunately,  this  simple  system  of  a credulous  CCP  is 
inadequate  when  packets  may  arrive  out-of-order  as  shown  by 
theorem  B below.  Once  sequencing  is  initialized,  sequence 
numbers  serve  to  validate  incoming  packets.  But  while  the 
connection  is  being  initialized,  there  is  no  way  for  the 
receiving  discipline  to  validate  an  arriving  SYN  since  it 
maintains  no  state  information  about  the  other  side's  ISN. 

THfl  B:  A CCP  that  maintains  no  state  information  about  ISN  for 
the  remote  end  of  the  connection,  but  accepts  ISN  from  an 
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FIGURE  8 SIMPLE  CONNECTION  ESTABLISHMENT  USING  SYN  CONTROL  PACKET 
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arriving  SYN  control  packet,  may  incorrectly  synchronize  the 
connection  and  cause  a deadlock. 

PROOF:  (See  figure  9)  Suppose  a SYN  control  packet  with  an 

old  ISN  was  retransmitted  and  delayed  in  the  transmission 
medium  during  a previous  incarnation  of  the  connection. 
This  old  SYN  may  arrive  just  when  the  new  connection  is  in 
the  opening  state,  and  be  accepted  as  valid.  E'’ 'I  will  be 
set  to  the  old  ISN,  and  the  connection  state  set  to 
Established.  But  no  data  will  be  accepted  as  in  theorem  2C. 

"3  Uau  Handshare** 

To  avoid  this  problem,  a more  reliable  means  of 
transmitting  the  current  ISN  to  a CCP  must  be  used,  Tomlinson 
(1974)  has  presented  such  a scheme  called  the  "3  way  handshake." 
Instead  of  simply  accepting  an  arriving  SYN,  the  receiving  CCP 
must  ask  the  sending  CCP  to  verify  the  SYN  as  current.  The 
receiving  CCP  returns  a SYN-Verify  control  packet  to  the  sending 
CCP  which  refers  to  the  ISN  from  the  SYN  (see  figure  10).  If 
tsie  SYN  was  a current  packet,  the  sender  returns  a positive 
acknowledgement  (ACK),  and  only  then  does  the  receiver  arccept 
the  SYN  and  set  ESN.  This  synchronization  must  occur  in  both 
directions,  with  the  SYN-Ve^ify  also  carrying  ISN  of  the 
receiver  in  the  other  direction. 

If  the  SYN-Vorify  references  an  old  ISN  (See  figure  11), 
the  sender  returns  a negative  acknowledgement  (NACK)  and  the 
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FIGURE  10  “3  WAY  HANDSHAKE”  CONNECTION  ESTABLISHMENT 
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receiver  discards  the  SYN.  This  takes  care  of  old 
(retransmitted)  SYN  or  SYN-Verify  packets. 

Col  I ision  Avoidance 

This  "3  way  handshake"  mechanism  for  establishing 
connections  is  inherently  asymetric,  with  one  side  initiating 
the  attempt  by  sending  a SYN,  and  the  other  side  waiting  to 
respond  to  a SYN  from  the  active  side.  However,  some  processes 
may  not  have  agreed  on  an  active  and  passive  side  and  both  sides 
may  attempt  to  initialize  the  connection.  Then  each  side  wi  M 
see  a simple  SYN  rather  than  a SYN-Verify  in  response  to  its  own 
SYN.  In  this  case  a collision  is  said  to  occur,  and  the 
coMisior  resolution  mechanism  used  in  broadcast  transmission 
media  [Abramson73a]  may  be  applied.  Both  sides  "forget"  that 
they  have  sent  or  received  any  SYN’s  and  wait  a random  amount 
of  time  before  trying  to  initialize  the  connection  again. 

Several  authors  have  investigated  the  relationship 
between  retry  intervals,  propagation  time,  and  time  until 
success  in  broadcast  media  [tietcal  fe73,  Abramson73a) . If  the 
retry  time  distribution  is  wide  relative  to  the  propagation 
delay,  then  very  likely  one  side  will  try  again  and  have  its  SYN 
delivered  while  the  other  side  is  still  waiting,  avoiding  a 
second  collision.  The  collision  avoidarice  mechanism  simplifies 
connection  establishment  since  it  reduces  simultaneous 
initiations  to  the  more  tractable  one-sided  attempts.  This 
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application  of  collision  avoidance  to  connection  establishment 
is  believed  to  be  a neu  technique. 

In  order  to  reduce  the  frequency  of  collisions, 
Tomlinson  has  suggested  that  a CCP  enter  a special  simultaneous 
in-' tial  ization  state  when  it  detects  a collision  [private 
communicat  ion] . Dalai  (1975)  has  developed  a I gor  i thms  wh  i ch 
allow  the  connection  to  be  reliably  established  for  "normal"  but 
simultaneous  ini t iai izat ion  attempts.  However,  if  an  "old"  SYN 
from^  a previous  incarnation  arrives  during  a simultaneous 
initialization  attempt,  the  CCP  must  still  give  up  and  retry 

from  scratch. 


— Correctness  of  Connection  Establishment  Mechanisms 

Sections  5.1  and  5.2  have  shown  the  shortcomings  of  some 
simple  protocol  initialization  mechanisms,  and  suggested  more 
complicated  mechanisms  to  successfully  deal  with  transmission 
medium  characteristics.  In  this  section  we  prove  that  a 
correctly  functioning  CCP  using  the  ISN  selection  and  3 way 
handshake  mechanisms  described  above  does  indeed  correctly 
establish  connections  for  reliable  interprocess  communication. 

THM  7:  A correctly  funct i on i ng  CCP  (w i th  i nf i n i te  qu  i t t i me) 

using  ISN  selection  and  3 way  handshake  mechanisms  above, 
correctly  establishes  connections  -espi te  transmission  medium 
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characteristics  of  loss,  delay,  damage,  duplication,  and 
out-of-order  delivery  of  packets. 

The  proof  of  theorem  7 is  based  on  a state  diagram  model 
of  the  protocol  process  on  each  side  of  the  connection.  These 
tuo  processes  interact  by  exchanging  packets  which  we  assume  may 
be  lost,  duplicated,  or  delivered  out-of-order  since  ue  are 
particularly  interested  in  developing  robust  protocols  for  worst 
case  situations.  Each  protocol  process  Is  driven  by  events 
including  user  commands,  packet  ar*-ivals,  and  internal  timers. 

The  complete  state  of  the  system  includes  both  protocol 
processes’  states  and  the  packets  in  the  transmission  medium.  A 
large  reduction  in  complexity  is  achieved  by  classifying  all 
packets  in  the  transmission  medium  as  either  "current"  packets 
c • "old"  packets  (cf  appendix  A).  Only  current  packets  must  be 
explicitly  represented  as  part  of  the  composite  state. 

Appendix  A proves  theorem  7 and  also  reproves  theorem  6 
using  the  compos,  te  state  formalism  to  show  the  correctness  of  a 
powerful  protocol  and  the  inadequacy  of  a simple  protocol  for 
connection  establishment.  Failure  recovery  techniques  are  also 
considered. 
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5.3  Closing  a Connection 

The  purpose  of  closing  a connection  is  to  return  the 
connection  to  the  Not  Active  state,  freeing  the  resources 
associated  with  maintaining  the  connection.  The  tables, 
buffers,  and  other  data  structures  used  to  support  the 
connection  are  then  available  for  other  connections. 

The  CLOSE  command  means  that  the  process  does  not  uiant 
to  send  or  receive  any  more  packets.  Normally  processes 
exchange  data  signaling  the  end  of  their  conversation,  and  then 
request  the  CCP  to  close  the  connection.  In  this  case  when 
processes  on  both  sides  of  the  connection  request  termination, 
the  CCP  at  each  side  can  simply  return  all  resources  and  place 
the  connection  in  the  Not  Active  state  without  exchanging  any 
control  packets.  This  simple  scheme  relies  on  both  processes 
cooperating  to  close  the  connection. 

However,  some  processes  may  not  have  an  agreed 
termination  procedure,  or  one  process  may  wish  to  terminate  the 
connection  while  the  other  attempts  to  continue.  The  simple 
unilateral  termination  scheme  above  might  leave  the  connection 
with  one  side  in  the  Not  Active  state,  while  the  other  side 
thinks  the  connection  is  still  Established  and  continues  to  use 
transmission  medium  resources  in  useless  (re)transm i ss i ons. 

Since  successful  communication  requires  cooperation  by 
both  sides,  when  either  side  attempts  termination,  both  sides 
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should  be  closed.  To  accomplish  this,  when  the  CCP  gets  a CLOSE  * 

request,  it  creates  and  transmits  a termination  control  packet 
(FIN).  This  control  packet  is  easier  to  validate  than  the 
initialization  control  packets  discussed  In  section  5.2  because 
the  connection  is  already  established. 

5.3.1  FIN  Meehan  ism  (See  figure  12) 

The  sending  side  places  the  normal  next  sequence  number 
in  the  FIN,  and  the  receiving  side  uses  the  sequence  number  to 
determine  whether  the  FIN  is  valid  or  an  old  duplicate  just  as 
for  data  packets.  Furthermore,  the  sequence  number  determines  i 

exactly  where  in  the  data  stream  the  FIN  occi: ‘s,  so  that  the 
receiver  can  wait  for  any  outstanding  packets  if  the  FIN  has 
arrived  out  of  order.  The  receiving  discipline  returns  an  ACK 
for  the  FIN  just  as  for  data  packets.  It  then  notifies  the 
process  that  the  other  side  has  terminated  the  connection,  and 
places  the  connection  in  the  Not  Active  state. 

Uhen  the  sending  discipline  sees  the  ACK  for  its  FIN,  it 
knows  that  the  other  side  has  terminated  the  connection  and  it 
can  finish  closing  the  connection  on  its  own  side.  In  this  way 
when  either  process  closes  the  connection,  the  resources  at  both 
sides  are  freed,  and  the  state  of  the  connection  is  kept 
consistent  at  both  sides  without  depending  on  process 
cooperation.  This  mechanism  allows  both  for  cooperating  ! 
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processes  to  terminate  the  connection  in  an  orderly  fashion 
(after  exchanging  all  desired  data),  and  for  one  process  to  shut 
off  the  other  uncooperative  process  and  prevent  useless 
act i V i ty. 

To  handle  the  case  where  both  sides  try  to  c>ose  the 
connection  and  send  FIN  simultaneously,  the  mechanism  used  in 
the  ARPA  net  Host-Host  protocol  may  be  adopted  ICarr70]  . 
Instead  of  acknowledging  a FIN  with  a normal  ACK,  the  reply  to  a 
FIN  is  another  FIN.  Then  the  initiating  and  replying 
termination  control  mesages  are  identical,  and  simultaneous 
closes  look  like  responses  to  both  sides  (see  figure  13). 
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FIGURE  13  CONNECTION  CLOSED  WITH  SIMULTANEOUS  FIN  PACKETS 
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5.3.Z  Possibilitu  of  "Hung"  Connections 

Even  uith  a FIN  mechanism,  a limited  type  of  connection 
state  inconsistency  is-  still  possible  in  closing  a connection. 
To  discuss  this  problem,  ue  use  the  fol  lowing  notat  ion:  In 
closing  a connection,  both  sides  must  move  from  the  Established 
state  (E)  to  the  Not  Active  state  (N'  by  passing  through  the 
Closing  state(s)  (C).  The  possible  connection  states  are  then 

<E,C>,  <E,N>,  <C,C>,  <C,N>,  and  <N,N>  counting  symmetric 
states  only  once. 
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Uhile  the  connection  is  in  the  Established  or  Closing 
states,  the  normal  retransmission/duplicate  detection  mechanism 
using  sequence  numbers  masks  the  effect  of  loss,  reordering,  and 
delay  in  the  transmission  medium.  Once  either  side  of  the 
connection  reaches  the  Not  Active  state  however,  essentially  all 
information  about  the  connection  is  lost  and  arriving  packets 
are  simply  discarded  (except  SYN  to  start  a new  connection). 
This  is  exactly  what  is  desired  in  the  <N,N>  state,  but 
deadlocks  are  possible  in  the  <E,N>  and  <C,N>  states. 

The  unilateral  close  mechanism  allows  the  <E,N>  state  to 
persist  without  any  failure  in  the  transmission  medium,  but 
because  the  processes  fail  to  agree  on  closing  the  connection. 
This  can  be  avoided  by  requiring  the  exchange  of  FIN  control 
packets  as  described  above. 

The  FIN  scheme  prevents  the  <E,N>  state,  but  results  in 
the  <C,N>  state  if  the  ACK  of  a FIN  is  lost  in  the  transmission 
medium.  In  this  case,  retransmissions  of  the  FIN  from  the 
Closing  side  are  discarded  because  the  connection  has  already 
been  inactivated.  It  is  appealing  to  try  to  solve  the  problem 
by  introducing  another  stage  in  the  control  packet  exchange 
where  the  respondent  to  the  FIN  returns  a FIN-Reply  control 
packet  and  does  not  inactivate  the  connection  until  receiving  an 
ACK  for  the  FIN-Reply.  Unfortunately  this  only  shifts  the 
problem  to  the  other  side  and  the  final  ACK. 
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THT1  8:  Any  mechanism  for  closing  connections  in  an  internally 
controlled  CCP  allous  either  <E,N>  or  <C,N>  states  to  occur 
where  the  connection  will  not  terminate  (enter  the  <N,N>  state) 
using  the  normal  closing  mechanism. 

PROOF:  For  unilateral  termination  schemes  the  <E,N>  state 
can  persist  if  one  of  the  processes  does  not  close  its  side 
as  discussed  above.  For  schemes  involving  exchange  of  FIN 
control  packets,  the  <C,N>  state  occurs  when  the  C side  has 
sent  the  FIN  type  packet,  and  the  N side  has  received  this 
packet  and  returned  an  ACK.  If  the  ACK  is  lost  or  damaged, 
the  C side  retransmits  the  FIN,  but  the  N side  discards  the 
retransmissions  because  the  connection  is  Not  Active, 

As  noted  in  Appendix  A,  such  "hung"  or  "half  open" 
connections  can  also  result  from  protocol  fai lures  where  one 
side  of  the  connection  must  restart  in  the  Not  Active  state.  To 
avoid  such  hung  connect  ions  whi I e cl  os i ng  a connect i on,  three 

types  of  solution  exist: 

(DA  timeout  mechanism  whereby  one  side  of  the  connection 
unilaterally  "gives  up"  and  goes  to  the  Not  Act  i ve  state 
when  it  gets  tired  of  waiting.  This  can  be  explicitly 
requested  by  the  process  (a  sort  of  Reset  command)  or 
automatically  performed  by  the  CCP.  This  corresponds  to  the 
Quit  time  defined  in  section  2. 
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(2)  A CCP  in  the  No  Active  state  returns  some  special 
control  packet  when  it  receives  a packet  for  a connection  it 
considers  inactive.  The  Connect  ion  Inactive  control  packet 
is  a kind  of  negative  ACk  and  refers  to  the  sequence  number 
of  the  arriving  normal  packet  so  the  CCP  that  receives  the 
NADC  can  verify  that  it  refers  to  a current  packet.  Of 
course  error  packets  are  not  returned  for  error  packets. 
Uhen  a CCP  in  the  E or  C state  gets  a NADC  instead  of  the 
expected  ACIC,  it  can  c'ose  the  connection.  This  corresponds 
to  the  Reject  mechanism  added  to  the  protocjl  for  failure 
recovery  in  Appendix  A. 

Another  similar  solution  for  a CCP  in  the  N state 
that  receives  a FIN  type  control  packet  is  to  construct  the 
appropriate  ACK  for  return  as  if  the  connection  were  still 
active.  This  avoids  special  processing  by  the  sender  in  the 
E or  C state  by  shifting  it  to  the  receiver  in  the  N state. 
This  is  not  always  possible  since  connection  state 
information  is  generally  discarded  uhen  the  connection 
enters  the  N state,  and  the  protocol  may  not  know  how  to 
construct  an  appropriate  ACK. 

(3)  Uhen  the  CCP  sends  the  final  ACK  before  setting  the 
connection  Not  Active,  it  can  send  n copies  of  the  ACK, 
where  n is  large  enough  to  “guarantee"  that  at  least  one 
will  get  through.  Thisbfute  force  approach  is  actually 
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used  in  at  least  one  protocol  knoun  to  the  author  (an  early 
version  of  the  ether  net  protocol  at  Xerox  PARC  used  n-10). 


5.4  Reducing  Costs  of  a CCP 

It  is  apparent  from  the  above  discussion  that 
transmission  medium  characteristics  of  delayed  out-of-order 
delivery  of  (retransmi tted)  packets  cause  difficult  problems  for 
reliable  communication.  One  seemingly  attractive  approach  to 
this  problem  is  to  require  a transmission  medium  that  delivers 
packets  in  order,  or  to  implement  a "lou  level"  protocol 
mechanism  that  orders  packets  on  a Host-Host  basis,  creating  a 
first  level  virtual  transmission  medium  that  delivers  packets  in 
order,  and  simplifies  the  interprocess  protocol  design. 

The  direct  cost  of  this  approach  is  the  cost  of  the 
sequencing  mechanism  itself  with  its  oun  Initialization 
problems.  Where  several  connections  share  the  same  Host-Host 
sequencing  mechanism,  significant  savings  may  result.  When  most 
connections  are  to  different  hosts,  the  two  level  mechanism, 
each  level  requiring  independent  state  information  and 
initialization,  may  result  in  increased  delay  and  higher  cost. 

The  indirect  cost  of  this  aproach  is  the  interference 
between  different  connections  now  sharing  the  same  ordering 
mechanism.  When  a packet  from  one  connection  is  lost  or 
delayed,  subsequent  packets  on  other  connections  cannot  be 
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delivered  even  though  they  uouid  be  in-order  on  their  own 
connection.  If  packet  lots  recovery  mechan'sws  are  alto  shared, 
then  buffering  constraints  mean  that  a sluggish  process  that  is 
slow  to  accept  its  arriving  packets  may  hold  up  all  the  other 
processes  sharing  the  same  sequencing  and  error  correction 
channe I . 

One  disadvantage  of  a CCP  is  the  relatively  large 
overhead  in  packet  heaoers  and  control  packet  exchanges 
required.  This  cost  is  particularity  heavy  for  short  single 
transact  ion  appi icat ions  IKI e inrock 74 ] . Nevertheless,  we  have 
shown  that  given  the  hostile  transmission  medium  characteristics 
di^scribed  in  chapter  I,  ''jt-erful  *"echanisms  are  necessary  to 
guarantee  reliable  communicat  lori.  A partial  solution  for 
transaction  traffic  may  be  to  mult'plex  many  transactions  over  a 
single  longer  duration  conection.  This  introduces  the 
interference  between  transactions  mentioned  aLove,  but  may  be 
justified  by  savings  in  overhead  and  connection  set  up  activity. 
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Chapter  HI 
PROTOCOL  EFFICIFNCV 


L ! NTRODUCT  I DN 

This  chapter  considers  the  efficiency  of  Interprocess 
communication  protocols  for  computer  networks.  At  with  the 
rel iabi I I ty  performance  discussed  in  chapter  II,  quantitative 
performance  delivered  to  processes  by  a communication  protocol 
must  be  based  upon  the  performance  of  the  transmission  medium 

underlying  the  protocol.  Transmission  medium  character  1st  ics 
most  important  to  efficiency  are  delay,  bandwidth,  maximum 
packet  sire,  and  error  character i st I cs. 

To  provide  efficient  interprocess  communication  based  on 
these  transmission  medium  character  I st i cs,  a protocol  can 
attempt  to  optimize  several  internal  parameters  such  as 
retransmission  interval,  packet  size,  flow  control  strategy, 
buffe>-ing,  and  ack now  ledger snt  scheme.  Of  course  much  of  the 
perforMance  seen  by  a process  on  one  side  is  controlled  by  the 

behavior  of  the  other  process  with  which  it  is  communicating. 
For  example  a protocol  cannot  on  the  average  provide  throughput 
to  a source  process  that  is  greater  than  the  acceptance  rate  at 
the  receiver.  In  general,  the  maximum  performance  possible 
under  ideal  process  behavior  is  of  interest,  as  well  as  reduced 

performance  due  to  limiting  process  behavior  on  one  or  both 
s ides. 
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This  chapter  develops  models  to  analyze  the  efficiency 
of  successively  more  complex  protocols.  The  two  main 
performance  mea<:ures  chosen  for  analysis  are  average  maximum 
Ihnpugfiput  and  mean  dalau  since  these  represent  the  performance 
of  primary  interest  to  processes  using  the  protocol.  By 
througtiput  we  mean  the  transmission  rate  of  useful  data  between 
processes,  excluding  any  control  information  or  retransmissions 
that  the  protocol  requires.  By  Jelay  we  mean  the  time  from 
starting  to  transmit  a packet  at  the  sender  to  successful 
arrival  of  the  entire  packet  at  the  receiver,  or  arrival  of  an 
acknoi  lodgement  at  the  sender  in  the  case  of  roundtrip  do  I ay. 
Ue  return  to  farther  define  these  performance  measures  later  in 
this  section. 

Other  efficiency  performance  measures  of  interest 
' I ude  retransmission  rate,  line  efficienrn^  and  buffer 
r_equi  remen  ts..  Retransmission  rate  indicates  the  number  of  times 
each  packet  must  be  transmitted  and  is  a useful  cost  measure 
since  packet  communication  costs  typically  include  a per  packet 
charge.  Line  efficiency  is  the  ratio  of  useful  traffic 
(throughput)  to  total  traffic  generated  by  a protocol  including 
control  information  and  retransmissions.  It  provides  a measure 
of  the  overall  efficiency  of  a protocol  by  indicating  the 
fraction  of  total  traffic  that  represents  use.ul  data.  Buffers 
are  required  at  the  sender  to  hold  packets  until  acknowledged, 
and  at  the  receiver  to  hold  packets  until  processed  or  for 
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sequencing  out-of-order  arrivals.  Limited  buffer  storage  • 

restrict’  throughput  fand  delay  achievable.  1 

i 

Several  authors  have  analyzed  the  efficiency  of  ] 

communication  protocols  for  simple  transmission  media  with  fixed  | 

delay  and  no  packet  loss  or  reordering  [Berice64a,  Benice64b, 
Danthine75c,  Pouzin73a,  Burton72,  Sastry741 . This  study 
emphasizes  performance  analysis  of  protocols  for  interprocess 
communication  over  packet  switching  nets  (PSN)  with  more  complex 
and  hostile  transmission  crjracteri sties  'cf  section  1-2). 

Delay  includes  o^icket  transmission  time,  or  the  time 
required  to  transmit  all  bits  of  a packet  into  the  transmission 
medium  (a  function  of  the  transmissic;  medium  bandwidth),  and 
propagation  de'au.  or  the  time  required  for  a bit  to  travel  from 
source  to  destination  through  the  transmission  medium.  In  a 
store-and-forward  PSN,  the  propagation  delay  r^ay  itself  have 
several  components  (cf  section  1-2)  which  we  do  not  consider 
further. 

Frequently  it  is  important  for  the  sender  to  receive  a 
positive  acknowledgement  that  the  packet  was  delivered,  in  which 
case  the  roundtrip  delay  or  time  for  successful  delivery  and 
return  of  response  is  the  significant  measure.  A transaction 
system  is  an  example  of  such  a situation.  Floundtrip  delay 
Includes  delay  for  a packet  to  reach  the  receiver,  processing 
time  at  the  receiver,  and  delay  for  the  response  to  reach  the 
sender. 
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If  a sending  process  produces  packets  for  transmission 
at  a high  rate,  the  protocol  may  be  unable  to  transmit  packets 
immediately  as  they  are  submitted.  In  this  case,  submitted 
packets  must  be  queued  until  they  can  be  transmitted.  The  total 
delay  seen  by  the  process  uill  consist  of  the  Halting  time  whi  le 
the  packet  is  queued  for  service  plus  the  normal  delay  to 
successfully  transmit  the  packet  through  the  transmission 
medium.  During  heavy  demand  periods,  the  total  time  to  complete 
a requested  transmission  may  be  dominated  by  the  Halting  time. 
Under  such  conditions,  the  throughput  is  also  important  in 
determining  total  completion  time  because  it  determines  the  rate 
at  Hhich  the  Halting  queue  is  emptied.  To  separate  those 
effects.  He  explicitly  exclude  the  above  Halting  time  from  our 
def ini t i on  of  delay. 

While  He  define  delay  as  an  inherently  single  packet 
phenomenon,  throughput  concerns  performance  for  a stream  of 
packets.  With  simple  protocols  that  transmit  a single  packet 
and  then  Halt  for  its  acknonl edgement,  throughput  is  simply  the 
inverse  of  roundtrip  delay  mul tipi ied  by  the  useful  bl  ts  per 
packet.  By  taking  advantage  of  the  pipeline  or  multi-server 
capacity  of  the  transmission  medium,  a protocol  can  transmit 
multiple  packets  nhi le  Halting  for  acknonl edgement s and  achieve 
higher  throughput.  The  extent  of  this  multiplexing  is  limited 
by  transmission  medium  capacity,  floH  control  mechanisms,  and 
other  constraints  discussed  in  this  chapter. 
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Achievable  throughput  is  found  to  depend  on  six  ?nain 
factors:  overhead,  retransmission  or  error  recovery,  flow 
control  or  multiplexing,  buffer  allocation,  receiver  rate,  and 
transmission  medium  bandwidth.  These  factors  in  turn  depend  on 
both  protocol  parameter  settings  and  transmission  medium 
characteristics. 

For  a simple  PAR  protocol  with  deterministic 
transmission  delay  on  a single  hop  transmission  line,  Metcalfe 
(1973)  has  evaluated  several  of  these  factors.  Section  3 
extends  the  analysis  of  the  error  factor  to  Include  more 
realistic  transmission  delay  functions  for  packet  switching 
networks,  and  to  Include  the  effect  of  varying  retransmission 
intervals.  Section  4 considers  the  effect  of  flow  control 
mechanisms  on  the  multiplexing  factor.  In  section  5 we  discuss 
several  acknowledgement,  retransmission,  and  buffer  allocation 
strategies,  and  consider  the  throughput  degradation  resulting 
from  buffer  limitations.  Section  B examines  the  effects  of 
requiring  sequencing  at  the  destination  (SPAR  protocols). 
Section  7 briefly  discusses  the  impact  of  packet  size  on 
protocol  performance. 

(•ffective  delay  depends  more  directly  on  the 
transmission  medium  characteristics,  but  packet  size, 
retransmission  interval,  and  sequencing  requirements  also  have 
important  effects.  In  general,  minimum  delay  and  maximum 
throughput  are  conflicting  goalstCrowther75,  0pderbeck74) , so 
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protocol  parameters  must  be  adjusted  to  provide  the  type  of 
service  desired  by  a particular  process  (cf  sect  ion  7). 

In  developing  our  efficiency  analysis,  ue  define  a large 
number  of  parameters  and  performance  measures  in  this  chapter. 
To  aid  in  remembering  them,  table  1 provides  a list  of  names  and 
brief  definitions  for  the  more  important  terms  along  with  page 
numbers  where  they  are  first  defined  or  discussed. 


Table  1 

Important  Names  and  Variables  Used  in  Chapter  III 


Name- 

Def i n i t i on 

Page 

fit) 

Transmission  delay  distribution 

83 

F(t) 

Transmission  delay  cumulative  distribution 

87 

P 

Packet  length 

83 

B 

Banduidth  of  transmission  medium 

83 

Tprop 

Propagation  time 

83 

P/B 

Packet  transmission  time 

83 

LS 

Loss  or  damage  probabi 1 i ty 

83 

TPmax 

Average  maximum  throughput  (useful  data  rate) 

85 

R 

Retransmission  interval 

86 

H 

Header  length 

85 

D 

Data  length  in  a packet 

86 

OH 

Overhead 

88 

TPoh 

Overhead  factor  in  throughput 

87 

g(t) 

Successful  transmission  delay  distribution 

87 

G(t) 

Successful  transt  delay  cumulative  distribution 

87 

DL 

Mean  delay  until  successful  delivery  of  a packet 

89 

Ntrans 

Mean  number  of  transmissions 

89 

TPretrans 

Retransmission  factor  in  throughput 

90 
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Table  1 (cont’d) 


Name 

De  f i n i t i on 

Pago 

Nu  i n 

Uindou  sire  for  flou  control 

109 

T 1 ocal 

Packet  transmission  time  (Host  to  Packet  Switch) 

112 

T ne  1 

Roundtrip  time  through  network  less  T local 

112 

RHO 

Ratio  of  service  times  or  rates 

112 

UT 

Utilization  of  sender 

113 

Nu i nmax 

Window  size  allowing  maximum  throughput 

115 

Nbuf 

Number  of  buffers  at  receiver 

123 

Pful  1 

Probability  that  all  buffers  are  full 

125 

TPbuf 

Buffer  limitation  factor  in  throughput 

127 

Tint 

Time  between  new  packet  transmissions 

i30 

H(t) 

Cumulative  delay  distribution  with  sequencing 

131 

DLseq 

Mean  delay  including  sequencing 

131 

Pinord 

Probability  packet  arrives  in  order 

135 

Pd  is 

Probability  packet  is  discarded 

136 

W 
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2.  SIMPLE  PROTOCOL  UlTHOUT  ERROR  CORRECTION 

Perhaps  the  simplest  communication  system  consists  of  a 
perfect  (error  free)  line  between  two  users  with  bandwidth 
B bits/sec  and  nearly  constant  propagation  delay  Tprop.  Uith  no 
errors,  overhead  for  headers,  or  flow  control,  users  simply 
transmit  data  over  this  line,  obtaining  a maximum  throughput 
TPmax  > B bits/sec.  The  mean  delay  to  deliver  a packet  of 
I ength  P bits  i s: 

T - P/B+Tprop  - transmission  tiir.e  + propagation  time  (1) 

The  line  efficiency  this  ideal  system  is  1,  giving  a 
transmission  cost  of  1 bit/bit. 

To  increase  the  generality  of  this  model,  we  will 
represent  the  propagation  time  for  a packet  as  a probabi  I i ty 
densitu  function.  f(t).  To  represent  a nearly  constant  delay 
Tprop,  f(t)  has  a narrow,  high  peak  at  time  t-Tprop  (see  figure 
la). 

Ue  also  introduce  the  possibility  of  transmission 
errors,  and  assume  that  damaged  packets  are  detected  and 
discarded  as  described  for  PAR  protocols  in  chapter  II,  but  as 
yet  no  positive  or  negative  acknowledgements  (ACKs  or  NACKs)  are 
returned  for  received  packets.  The  probabi  I itu.  LS.  of  lost  or 
damaged  packets  (which  may  depend  on  the  packet  length)  can  be 
included  in  f(t)  as  an  impulse  at  t-infiriity  with  value  LS  (the 
probability  that  a packet  never  arrives)  (see  figure  lb). 


FIGURE  1 TRANSMISSION  DELAY  DENSITY  FUNCTION  f(t) 


(a)  PROPAGATION  DELAY  DENSITY  FUNCTION  f(t)  WITH  NO  PACKET  LOSS 


(b)  PROPAGATIOrj  DELAY  DENSITY  FUNCTION  f(t)  WITH  PACKET  LOSS 
PROBABILITY  LS 


TIME  t 

(c)  TRANSMISSION  DELAY  DENSITY  FUNCTION  f(t)  INCLUDING  PACKET 
TRANSMISSION  TIME  P/BAND  PACKET  LOSS  PROBABILITY  LS 


Tprop  + P/B 
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Since  ue  are  primarily  interested  in  PSN  whore  the 
end-to-end  propagation  time  is  much  larger  than  the  Host  to 
Packet  Switch  transmission  time  for  a packet,  it  is  convenient 
to  also  include  the  transmission  time  for  a packet  of  length  P 

in  f(t)  which  will  now  al so  depend  on  P (see  f I guro  Ic).  Ue 

call  the  end-to-end  (or  roundtrip)  delay  distribution  including 
packet  transmission  time  and  packet  loss  or  damage  probabi  I j ty 
transmission  delau  densitu  function.  f(t). 

The  source  can  still  transmit  packets  at  rate  B/P,  but 
only  the  fraction  (1-LS)  arrive  successfully  at  the  destination. 
Hence  the  average  maximum  throughput  is  given  by: 

TPmax  . 0. (1-LS)  (2, 

The  mean  delay  is  strict  ly  inf  ini  te  since  some  packets  (LSmO) 

never  arrive.  The  lino  efficiency  is  (1-LS),  but  the 

transmission  cost  is  not  well  defined  since  some  data  is  never 
do  I i vorod. 
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3.  PAR  PROTOCOL  (RETRANSniSSION) 

The  most  imporiant  reliability  performance  goal 
discussed  in  chapter  II  is  to  deliver  each  packet  precisely 
once.  In  a PSN  environment  where  packet  loss  and  duplication 
occur,  reliable  communication  requires  a PAR  type  protocol  as 
described  in  section  1 1-3.  Adding  the  constraint  that  every 
packet  fnust  be  successfully  delivered  requires  analysis  of 
retransmi ssi on  and  control  overhead  necessary  for  reliable 
communication.  This  introduces  the  retransmi ssi on  time  intwrual 
parameter,  R;  i f an  ADC  is  not  received  within  time  R after  a 
packet’s  last  transmission,  the  packet  will  be  retransmitted. 

To  provide  error  and  duplicate  detection  with  a PAR 
protocol,  each  packet  must  carry  some  control  information,  or  a 
of  length  H in  addition  to  data.  The  header  typically 
includes  a checksum  for  error  detection,  and  an  Identifier  or 
sequence  number  for  dup I i cate  detect i on.  It  may  also  include 
address  information,  reverse  ADCs,  text  length,  or  flow  control 
information  In  general  the  header  length  is  fixed  so  P«H-4^D 
where  D is  the  (variable)  data  length.  The  fraction  of  each 
packet  taken  up  by  the  header  will  be  called  overhead.  OH  ■ H/P, 
which  varies  from  H/Pmax  of  a few  percent,  to  H/H-1  for  control 
packets  with  no  data.  The  throughput  obtained  due  to  other 
considerations  must  be  multiplied  by  a factor  TPoh  to  account 
for  the  portion  of  bandwidth  consumed  by  overhead: 
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TPoh(P.H)  . 1-H/P 


(3) 


Retransmission  introduces  the  first  real  difficulty  in 
analyzing  quantitative  protocol  performance.  Now  ue  must  find 
the  probability  dirtribution  for  the  first  successful  delivery 
of  possibly  many  (re)transmissions.  To  do  so,  ue  assume  that 
retransmissions  take  precedence  over  new  transmissions,  and 
hence  uhen  a packet's  retransmission  time  arrives,  it  is 
i riimedi  ate  I y retransmitted.  Preemption  of  a partially 
transmitted  packet  is  not  allowed,  so  a retransmission  may 
actually  have  to  wait  for  completion  of  a transmission  in 
progress,  but  we  assume  this  waiting  time  is  insignificant 
compared  to  the  retrar.onission  interval  R.  This  is  a reasonable 
assumption  in  a PSN  where  R is  typically  an  order  of  magnitude 
larger  than  a packet  transmission  time,  P/B. 

Ue  also  assume  that  the  end-to>end  delay  density 
function  f(t)  and  its  associated  cumulative  distribution  F(t) 
are  identical  for  each  (re)transmission  of  a packet,  i.e.  the 
delays  for  (re)transmi ssions  of  a packet  are  independent.  This 
assumption  is  reasonable  for  large  retransmission  Intervals 
typical  of  PSN’s  where  alternate  routing  and  long  paths  minimize 
dependence  [Forgie751. 

Ue  can  now  write  the  successful  transmission  delau 
distributions  including  retransmission.  g(t)  and  G(t),  In  terms 
of  R and  the  basic  transmission  delay  distributions  f(t)  and 
F(t)  directly  from  basic  probability  considerations: 
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G(t)»  Problat  least  one  successful  delivery  by  time  t) 

■ 1 - Probino  success  by  time  t) 

■ 1 - Probdst  transmission  not  arrived 

and  2nd  trans.  not  arrived  ... 

...  and  nth  trans.  not  arrived'  n- ft/Rl 

n 

■ 1 - n Problith  trans.  not  yet  arrived) 

i«l 

n-1 

- 1 - n [l-F(t-i-R)l  (4) 

i «0 


g(t)«  Prob (first  successful  delivery  occurs  at  time  t) 
n 

* 1 Prob  (trans.  i arrives  at  time  t and  no  other 

i«l  transmission  yet  arrived)  n«  ft/R] 

n-1  n-1 

- I (f(t-i-R)  * n (l-F(t-j-R)J)  (5) 

i *0  j«0 


Of  course  g(t)  » d/dt  G(t)  as  required  for  any 
probability  distribution  with  the  understanding  that  G(t)  is  at 
leas\  piecewise  continuous  (may  abruptly  change  slope  at  points 
t»i*R)  so  that  g(t)  may  have  step  discontinuities  at  points  t«i*R, 
Using  equation  5 directly,  it  is  also  possible  to  determine  the 
probability  mass  function  g(t)  for  a discrete  distribution  f(t). 

The  mean  delau  including  retransmission  until  the  first 


successful  delivery  is  given  by: 
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DL(R,F)  « /“  t*g(t)*dt 

0 

- [l-G(t)]-dt 


(6) 


Another  important  measure  is  the  mean  number  of 
transmissions  untii  the  first  success: 


Ntrans(R,F)  - E i • Prob (f i rst  success  between 

i"l  transmission  i and  1+1) 


- E i • (G(i-R)  - G((i-1).R)) 
which  telescopes  to 

n-1 

- I imit  tn*G(n*R)  - 1 G(i-R)) 

n-Ho  I »o 

Noting  that  limit  G(n*R)  must  be  1 gives 
n**«o 

n-1 

- limit  E (1  - G(i-R)l 

n-Mo  i «0 


S (l-G(l-R)l 
i-1 


(7) 


If  F(t)  represents  the  roundtrip  delay  distribution 
(time  from  transmitting  a packet  until  first  ACK  received),  then 
equation  6 gives  the  mean  roundtrip  delay  for  a successful 
(acknowledged)  transmission  as  a function  of  the  retransmission 
interval  R.  Equation  7 gives  the  mean  number  of  transmissions 
to  achieve  successful  transmission  as  a function  of  R, 
Typically  packet  communication  costs  are  dominated  by  a per 
packet  charge,  so  Ntrans  Is  also  a good  cost  measure*  Since 
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each  successful  transmission  requires  on  the  average  Ntrane 
actual  transmissions,  maximum  throughput  attainable  ui  I I be 
proportional  to  a retransmission  factor; 

TPretrans  - l/.^trans  (8) 


In  general,  a larger  retransmission  interval  R allous 
higher  throughput  since  no  banduidth  is  "wasted"  retransml tt  Ing 
packets  that  might  be  long  delayed  and  not  actually  lost.  This 
is  true  because  the  source  can  continue  sending  new  packets 
while  it  waits  for  ACKs  of  delayed  packets  (i.e.  no  sequencing 
of  packets  is  performed). 

Smal ler  R reduces  mean  delau  for  two  reasons: 

1)  A Loss,  factor.  Packets  actually  lost  or  damaged  are 
retransmitted  sooner. 

Z)  An  QR,  factor.  Since  all  retransmissions  are  equivalent^ 
the  OR  function  in  accepting  retransmissions  selects  the 
minimum  transmission  time.  The  more  retransmissions  in 
progress  at  once,  the  smaller  the  minimum  time  for  one. 

The  remainder  of  this  section  examines  several 
representative  transmission  delay  distributions,  f(t),  to 
explore  the  resulting  protocol  performance  as  a function  of  the 
-etransmission  interval  R.  The  mean  of  each  f(t)  is  fixed  at 
unity  to  facilitate  comparison,  while  shapes  and  variances  of 
f(t)  are  varied.  In  each  case,  the  resulting  successful 
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transmission  delay  distributions,  g(t)  and  G(t)  from  equations  5 
and  4,  are  plot  tod  for  several  values  of  R and  packet  loss 
probability,  LS.  Then  equations  6 and  8 are  used  to  plot  delay 
and  throughput  as  functions  of  R and  LS.  Finally,  delay  versus 
throughput  is  plotted  for  each  f(t). 


3.-1  Constant  Transmission  Del au 

Figures  2 and  3 show  the  successful  transmission  delay 

distributions  g(t)  and  G(t)  resulting  from  a constant 

transmission  delay  function  F(t)  with  constant  delay  D-1  and 

loss  probabi I i ty  LS: 

(0  t < D 

F(t)  - ( 1-LS  D < t < - 

(1  t«« 

For  this  simple  F(t),  analytic  results  are  easily  derived  for 

the  mean  delay  until  successful  transmission,  EX.,  and  number  of 
transmissions,  Ntrans: 

DL(D,R)  - D ♦ R-LS/(1-LS) 

Ntrans (D,R)  - 1/(1-LS)  + [D/Rj 

These  results  include  the  expected  sum  of  a geometric 
series,  1/(1-LS),  since  In  this  case  transmission  is  just  a 
repeated  series  of  independent  trials,  each  with  probability  LS 
of  failure.  Flean  delay  OL  is  just  the  fixed  delay  D plus  a term 
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SUCCESSFUL  TRANSMISSION  DELAY  PROBABILITY  MASS  FUNCHON  h/m 
FOR  CONSTANT  TRANSMISSION  MEDIUM  DELAY  D - 1 
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F IGURE  3 SUCCESSFUL  TRANSMISSION  DELAY  CUMULATIVE  DISTRIBUTION  GU) 
FOR  CONSTANT  TRANSMISSION  MEDIUM  DELAY  D - 1 ’ 
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f 

, proport ionai  to  the  retransmission  intei‘val  R,  T liy  the  iose 

factor  operates  to  iouer  delay;  since  F(t)  is  constant,  there  is 
no  overiap  in  f(t)  from  subsequent  transmissions  and  the  OR 
factor  is  zero.  Figure  4 shows  mean  deiay  DL  as  a function  of  R 
for  D-1.  The  deiay  for  a constant  F(t)  gives  the  upper  bound 
for  deiay  resuitIng  from  other  F(t)  with  nonzero  variance  where 
the  OR  factor  does  contribute  to  reducing  DL. 

Ntrans  is  just  the  mean  number  of  triais  for  a Bernouiii 
process,  1/(1-LS),  pi us  the  addit ionai  number  of  triais  executed 
untii  the  success  becomes  "known"  time  D iater.  Figure  5 shows 
the  mean  throughput,  TPretrans,  as  a function  of  R. 

Figure  B shows  deiay  versus  throughput  resuiting  from  a 
constant  F(t)  with  D«l.  For  reaiistic  error  rates  (LS«1),  R»D 
is  c i ear iy  the  optimai  retransmission  intervai  since  there  is  no 
throughput  gain  by  waiting  ionger  than  [j,  and  iittie  deiay  gain 
for  retransmitting  before  time  D.  A constant  transmission  deiay 
function  presents  an  unreai  isticai  iy  narrow  deiay  distribution, 
but  it  does  capture  the  minimum  deiay  behavior  typicai  of  PSN*s. 

3.2  Exponent  iai  Transmission  Dei  an 

Figures  7 and  8 show  g(t)  and  G(t)  resuiting  from  an 
exponent iai  transmission  deiay  function  (with  mean  deiay  - 1): 
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FIGURE?  SUCCESSFUL  TRANSMISSION  DELAY  PROBABILITY  DENSITY  FUNCTION 

FOR  EXPONENTIAL  TRANSMISSION  MEDIUM  DELAY  WITH  MEAN  - 1 ’ ^ ’ 
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FIGURE  8 


SUCCESSFUL  TRANSMISSION  DELAY  CUMULATIVE  DISTRIBUTION,  G(t) 
FOR  EXPONENTIAL  TRANSMISSION  MEDIUM  DELAY  WITH  MEAN  - 1 
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-u*  t 

F(t)-  { (i-LS)d-e  ) 0 < t < - 

(1  t - » 

Figures  7a  and  8a  sho;j  the  successful  transmission  delay 
distributions,  g(t)  and  G(t),  for  several  retransmission 
intervals  R but  no  packet  loss  (LS-0).  Smaller  R moves  the 
delay  density  toward  shorter  times  because  of  the  significant  OR 
factor  with  the  wide  exponential  f(t).  Figures  7b  and  8b  show 
g(t)  and  G(t)  for  several  packet  loss  probabilities  LS  at  a 
fixed  R.  Smaller  LS  also  moves  the  delay  density  to  the  loft. 

For  LS-0,  equations  B and  7 readily  yield  analytic 
expressions  for  mean  delay  DL  and  number  of  transmissions 
Ntrans; 


DL(u,R) 


L • 11  - £ 

u i-1 


I 1 e 
i-  (i+1) 


(l+l)*R*u/2 

]) 


Ntrans (u,R) 


•»  “i  • ( i+l)*R»u/2 
£ e 

i-0 


For  nonzero  LS,  numerical  solution  techniques  become  necessary. 

Figures  9 and  10  show  mean  delay  DL  and  throughput 
TPretrans  for  various  R and  LS.  Results  from  the  previous 
section  are  shown  c-»tted  for  comparison.  The  OR  factor  serves 
to  lower  delay  for  smfil  1 R because  the  wide  exponential  f(t)  for 
neighboring  transmissions  overlap  significantly  for  small  R.  As 
R Increases,  mean  delay  approaches  th«  upper  bound  proportional 
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to  R as  with  a constant  f(t)  where  only  the  Loss  factor  is 
contributing.  Throughput  rises  smoothly  with  R because  of  the 
wide  spread  of  f(t). 

The  exponential  Iransmi ssion  delay  function  presents  a 
wide  delay  distribution  but  does  not  Incorporate  a minimum 
delay,  opposite  to  a constant  f(t).  There  is  no  optimal 
operating  point  on  the  throughput  vs.  delay  curves  shown  in 
figure  11,  but  rather  a smooth  tradeoff  of  throughput  for  delay. 


r I ana i an  Transm i ss i on 


The  Erlangian  distribution  represents  a more  realistic 
transmission  delay,  including  a minimum  transmission  time, 
moderate  variance,  and  a small  but  long  tail.  Actually  the 
Erlangian  is  a family  of  distributions,  with  mean  determined  by 
the  parameter  u and  variance  by  the  "shape"  parameter  k; 


f (t)- 


( (1-LS)  • (k«u)  • {k«u« t) 

( (k-D! 

( 

( LS*(unit  impulse  at  t-») 


k-1  -k*u*t 


0 s t < « 


The  mean  of  the  Erlangian  distribution  is  1/u  while  the  variance 
with  mean  of  unity  is  just  1/k.  This  family  conveniently  models 
a wide  range  of  delay  distributions  from  exponential  (k-1)  to 
constant  (k-nc).  Figure  12  shows  the  Erlangian  f(t)  with  mean-1 
and  k-1, 4, 16. 
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FIGURE  12 


ERLANGIAN  PROBABILITY  DENSITY  FUNCTION.  f(t),  WITH  MEAN  - 1 
AND  SHAPE  PAP.AMETER  k - 1, 4. 16 
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Several  authors  have  measured  ARPANET  mean  delay  tlmeo 
under  var i ous  condi t i ons  [K I einrock74a,  Cole71,  Naylor731  and 
more  recently  Forgle  (1975)  has  obtained  transmiseion  delay 
distributions  under  a limited  set  of  circumstances.  Even  under 
these  limited  ci rcumstances,  there  is  considerable  variation  in 
the  spread  of  the  delay  distribution,  but  the  Erlangian 
distribution  with  k«lB  provides  a reasonable  approximation  to 
real  network  transmission  character i st i cs  while  remaining 
computationally  manageable.  As  we  shall  see  below,  protocol 
performance  is  relatively  insensitive  to  the  exact  shape  or 
variance  of  f(t)  as  long  as  the  variance  is  not  larger  than  one, 
so  a perfect  representat ion  of  network  delay  is  unnecessary. 

Figure  13  shows  the  successful  transmission  delay 
distribution  g(t)  resulting  from  an  Erlangian  f(t)  with  mean-1 
and  k»lB.  Figure  14  shows  the  cumulative  delay  distribution 
G(t)  for  several  retransmission  intervals  R and  loss 
probabilities  LS.  Again  smaller  R and  LS  move  the  distribution 
to  the  left  (shorter  times)  although  not  as  much  as  with 
exponential  f(t). 

Figure  15  shows  mean  i.'elay  DL  as  a function  of  R for 
several  LS.  Results  from  the  previous  section  are  shown  dotted 
for  comparison.  The  Loss  factor  and  the  OR  factor  both  serve  to 
reduce  delay  for  small  R,  but  the  OR  factor  is  much  less 
pronounced  than  for  exponential  f(t)  since  the  delay  density  is 
more  concentrated  about  the  mean.  For  large  R,  mean  delay  again 
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F IGURE  13  SUCCESSFUL  TRANSMISSION  DELAY  PROBABILITY  DENSITY  FUNCTION 
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FIGURE  15 
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approaches  the  upper  bound  proportional  to  R as  for  a constant 
f(t)  where  the  OR  factor  does  not  contribute  at  all. 

Figure  IB  shows  mean  throughput  TPretrans  as  a function 
of  R for  several  LS.  Throughput  resulting  from  the  moderate 
variance  Erlangian  f(t)  with  k.lB  is  already  approaching  the 
step-like  behavior  derived  for  a constant  f (t),  wl th  faster 
approach  to  the  limiting  throughput  for  R>1  than  with  the  wider 
exponential  f(t). 

Finally,  figure  17  shows  delay  versus  throughput 
resulting  from  the  Erlangian  f(t)  with  mean-1  and  k-lB.  For 
nonzero  packet  loss  probabilities,  a def ini te  "knee"  occurs 
because  delay  increases  linearly  with  R while  throughput  quickly 
approaches  its  maximum  with  increasing  R. 

3.4  Results 

Ue  have  examined  PAR  protocol  performance  resulting  from 
varying  the  retransmission  interwal  R gith  a wide  range  of 
transmission  delay  distributions  f(t)  and  packet  loss 
probabilities  LS.  Mean  delay  DL  rises  linearly  with  R and  LS 
for  realistic  values  as  expected  in  a "repeat  unt  i I success" 
system.  For  R<1,  DL  drops  somewhat  more  quickly  due  to  the  OR 
factor  described  above.  However,  this  effect  Is  only 

significant  with  high  variance  f(t),  and  is  accompanied  by  a 
large  Increase  In  the  average  number  of  transmissions  required, 
and  hence  a decrease  In  attainable  throughput. 
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A throughput  factor  TPretrans  equal  to  the  inverse  of 
the  number  of  transmissions  required  uas  defined  and  represents 
the  maximum  average  throughput  attainable  uith  a given  R,  taking 
into  account  the  fraction  of  bandwidth  used  in  retransmission. 
TPretrans  asymptotically  approaches  its  maximum  of  1/(1-LS)  for 
large  R.  Uith  realistic  f(t)  this  results  in  the  "knee"  or 
optimal  performance  area  observed  in  delay  versus  throughput 
curves.  The  location  of  this  knee  is  determined  primarily  by 
the  mean  and  variance  of  f(t),  and  not  by  loss  probability  LS. 
The  knee  is  sharpest  for  small  variances  and  occurs  at  a value 
of  R such  that  R is  also  the  knee  of  the  F(t)  curve  (i.e.  the 

packet  has  almost  certainly  arrived  If  it  is  going  to  arrive,  by 

time  R after  transmission). 

In  summary,  the  best  strateau  for  choosing  a 

retransmission  interval  R is  to  set  R equal  to  the  time  when 
"most"  transmissions  would  have  succeeded  if  there  were  no  lost 
or — damaged — packets.  Larger  R brings  minimal  improvement  in 

attainable  throughput  while  Increasing  delay.  Smaller  R brings 
significant  throughput  degradation  with  minimal  decrease  in 
delay.  However,  for  low  total  throughput  requirements,  mean 
delay  may  be  reduced  by  using  a smaller  R,  but  with  a 

substantial  cost  in  additional  retransmission. 

For  realistic  error  rates  (LS«1).  mean  dalau  is 
insensitive  to  R.  so  a relatively  wide  range  of  R le  near 
optimal.  Since  network  transmission  delay  varies  with  time. 
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using  a somewhat  larger  fixed  R is  probably  a good  heuristic  to 
stay  in  the  high  throughput  portion  of  the  performance  curve.  R 
may  also  be  set  dynamically  on  the  basis  of  observed 
transmission  delays. 

He  are  now  able  to  include  the  effects  of  retransmission 
and  overhead  in  protocol  performance.  Equation  B gives  the 
delay  resulting  from  choice  of  R.  The  maximum  average 
throughput  attainable,  TPmax,  is  a product  of  the  overhead 
factor  TPoh  from  equation  3,  the  retransmission  factor  TPretrans 
from  equation  8,  and  the  transmission  medium  bandwidth  B; 


TPmax  = TPoh  • TPretrans  • B 
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4.  FLOU  CONTROL 

In  section  3 we  found  the  throughput  limitation  due  to 
retransmission  of  packets  by  deriving  the  f 'action  of  available 
bandwidth  consumed  by  retransmissions.  Another  throughput 
limitation  results  when  roundtrip  delay  is  large  relative  to 
packet  transmission  tine  as  is  frequently  the  case  in  packet 
switching  networks.  In  this  case,  the  sender  may  be  idle  a 
large  fraction  of  the  time  waiting  for  an  acknowledgement. 

To  achieve  higher  throughput,  the  sender  may  be  allowed 
to  transmit  multiple  packets  before  receiving  any 
acknowledgements.  Since  each  outstanding  packet  requires  buffer 
storage  and  other  source  resources,  an  importatant  efficiency 
question  becomes  how  large  must  the  "window”  nf  al  lnu«d 
transmissions  be  in  order  to  achieve  maximum  throughout? 

For  several  reasons  it  is  also  desirable  and  even 
imperative  for  source  transmission  rate  to  be  I imi  ted.  The 
transmission  medium  itself  may  become  congested  due  to  excessive 
traffic  from  all  Hosts  it  serves,  requiring  some  means  of 

congestion control  to  limit  entering  traffic.  Several 

techniques  have  been  proposed  to  deal  with  network  congestion 
[<ahn72,  HcQui  I Ian72,  Davies72,  Pouzin73b,  Belsnes74, 
Crowther75I.  These  constraints  are  general  I y enforced  by  the 
transmission  medium  and  are  not  under  control  of  a communication 
protocol  so  we  do  not  discuss  them  further. 
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More  relevant  to  this  study,  transmission  rate  between 
each  source  and  destination  process  must  be  controlled  to  match 
a sender  s production  rate  to  the  receiver’s  consumption  rate, 
minimizing  buffer  storage  and  bandwidth  requirements  for  the 
resulting  throughput.  This  presents  the  main  problem  of 
interprocess  flow  control  and  suggests  a second  important 
efficiency  question;  How  small  should  the  wlndoK  of  al  ioued 
transmissions  be  In  order  to  limit  source  trgnsmiaalon  to  a 
given — rate?  Several  authors  have  discussed  techniques  for  end 

to  end  flow  control  in  PSN  tKahn72,  Ualden72,  Carr70, 

Zimmerman75,  Pouzln74c,  Cerf74c,  Cerf75,  Belsnes74,  Crowther75, 
0pdorbeck741 , but  quantitative  results  have  been  lacking.  In 
many  cases,  results  are  complicated  by  sequencing  or  reassembiu 
requirements  discussed  In  section  6. 

Most  strategies  can  be  described  in  terms  of  a limited 

w i ndow §lZfi.  Nwin,  of  allowed  transmissions  tPouzin74c, 

Cerf74cl.  In  general  a limit  of  Nwin  packets  (and/or  bits)  is 
imposed  such  that  up  to  Nwin  packets  (bits)  may  be  transmitted 
but  not  yet  acknowledged  at  any  moment.  When  the  limit  is 

reached,  the  sending  discipline  stops  transmitting  new  packets 
until  an  ACK  arrives,  freeing  space  for  new  transmissi  n.  If  R 
is  exceeded  for  some  pending  packet,  the  packet  may  still  be 
retransmi tted. 

Strictly  speaking,  an  arriving  ACK  functions  only  to 
signal  the  error  recovery  mechanism  that  a packet  has 
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successfully  been  received  and  no  further  retransmissions  are 
required.  The  credit  granted  for  new  transmission  may  be  less 
than,  equal  to,  or  greater  than  the  amount  acknowledged.  The 
separation  of  credits  from  ACKs  is  more  fully  discussed  in  the 
next  section,  while  in  thic  section  we  consider  only  the  fixed 

window  size  case  where  an  ACK  impi ici t ly  grants  permi ssion  to 

send  a new  packet. 

The  limit  of  Nwin  pending  packets  (or  bits)  functions  as 
flow  control  on  the  source  of  packets.  Ue  want  to  determine  tho 
eolation  of  Nwin  to  achievable  throughout  so  the  receiver  ran 
select  Nwin  to  I imi  t , throughput  or  achieve  maximum  thm..r|hp..»  .... 
desired.  As  a crude  means  of  flow  control,  the  receiver  could 
simply  discard  arriving  packets  in  excess  of  the  rate  desired, 
but  this  strategy  wastes  transmission  medium  capacity,  degrading 
performance  for  other  connections,  and  increases  costs  by 
increasing  the  retransmission  required.  Hence  it  is  desirable 
to  select  Nwin  to  limit  the  sender’s  transmission  rate  so  that 

essentially  all  packets  arriving  at  the  receiver  can  be 
accepted. 

Figure  18  shows  a closed  network  of  queues  with  two 
servers  that  adequately  represents  the  constant  window  size  flow 
control  model.  Server  1 represents  a source  of  packets  to  be 
transmitted  serially  into  the  net.  The  infinite  server  system  2 
represents  the  (parallel)  transmission  of  packets,  processing  at 
the  destination,  and  return  of  ACKs  through  the  net  to  server  1. 


Nwin  CUSTOMERS 


Server  1 
Tiocel 
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There  are  Nuin  "custorners"  in  the  system,  so  whenever  all  Nwin 
packets  are  in  service  at  server  2,  server  1 is  idle.  These 
idle  periods  represent  the  time  a source  would  be  blocked  from 
transmitting  new  packets  by  the  flow  control  mechanism.  (Note 
that  re'ransm. ssions  of  pending  packets  may  occur  during  blocked 
times  in  the  real  protocol  which  are  not  represented  in  the 
queueing  model— see  below.) 

This  model  is  an  instance  of  the  classic  machine 
repairman  problem  [CoxBlJ  with  the  emphasis  on  the  transmitter 
(repairman).  Ue  wish  to  determine  how  the  throughput 

(utilization)  of  server  1 depends  on  Nwin  and  the  ratio  of 
service  times  of  the  two  servers.  Ideally  the  transmitter 

should  be  busy  all  the  time,  but  if  Nwin  is  too  small, 

transmission  is  blocked  (the  repairman  runs  out  of  work). 

Let  T I oca  I - the  mean  Host  to  packet  switch  transmission 
time  for  a packet,  and  Tnet  « the  mean  roundtrip  time  less 

T local.  Let  the  ratio  of  the  mean  service  times  of  the  two 
servers  be; 

RHO  ■ (ST  1)/(ST  2)  ■ TIocal/Tnet  ® u2/ul 
where  ul  and  u2  are  the  service  rates  of  each  server.  Focusing 
attention  on  server  1 (the  sender),  let  nl  be  the  number  of 
customers  queued  or  in  service  at  server  1,  and  Pi  . Prob(nl«i). 
PO  is  the  probabi  I ity  that  server  1 is  idle  (blocked  from 

transmitting  by  the  flow  control  mechanism).  When  there  are  i 
customers  at  server  1,  there  will  be  Nwin-i  customers  at  server 
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2,  and  the  arrival  rate  of  new  customers  at  server  1 will  be 
u2*(Nuin-i).  This  shows  that  the  closed  system  in  figure  18  Is 
equivalent  to  an  oeen  single  server  system  with  finite  customer 
copulation  Nwin.  Kleinrock  (1975)  gives  the  expression  for  1/PO 
in  this  system: 

Nwin  j 

1/PO  ■ £ Nwin!  • RHO 

i*0  (Nwin-i)l 

Finally  define  the  uti  I ization  of  server  1,  LIT  ■ 1-PO: 

UT (Nwin, RHO)  « 1 - 1 „ 

T ^ 

2 Nwin!  • RHO 
i*0  (Nwin-i)l 

In  figure  19  we  plot  UT  versus  Nwin  for  various  values  of  RHO 
assuming  exponentially  distributed  service  times.  Realistic 
values  for  RHO  in  packet  switching  nets  are  typical  iy  around  0.1 
since  roundtrip  delays  are  an  order  of  magnitude  iarger  than 
packet  transmission  times.  Figure  19  shows  that  ut i I i zat i on 
(throughput)  rises  approximately  linearly  with  window  size  up  to 
the  half-way  point  of  UT-0.5,  and  then  more  slowiy  approaches 
unity  with  increasing  window  size.  For  smai  ier  RHO,  a larger 
window  size  is  necessary  to  keep  the  sender  busy 

An  upper  bound  on  UT  can  be  found  by  considering  the 
deterministic  system  with  constant  service  times:  STZ-n-STl 

(Note  n-l/RHO).  In  this  system  a window  size  of  exactly 


FIGURE  19  FACTOR  UT  vs.  FLOW  CONTROL  WINDOW  SIZE  Nwin  FOR 

VARIOUS  RHO  - TIoctl/Tnet 


!RHO  IS  AWPOXIMATELV  THE  RATIO  OF  PACKET  TRANSMISSION  TIME  TO 
ROUNOTRIP  TIME  IN  THE  TRANSMISSION  MEDIUM) 


WIND0W  SIZE  Nw 
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Nuin-n-fl  is  required  to  keep  the  sender  busy  (ignoring  errors) 
since  then  the  first  ACk  Mill  be  returning  just  as  the  n+lst 
packet  is  transmitted.  For  Nuin  < n-fl,  UT«Nwin/ (n+1).  For 
other  service  distributions  with  coefficients  of  variation  Cs 
betueen  0 (constant)  and  1 (exponential),  Cox  and  Smith  (1961) 
suggest  that  the  value  of  UT  may  be  found  by  linear 
interpolation  uith  Cs  squared  betueen  constant  and  exponential 
values  of  UT: 

2 

UT  « UTconst  >f  Cs  • (UTconst-UTexp) 

Ue  are  nou  in  a position  to  answer  the  two  questions 
posed  above: 

1)  Hou  large  a window,  Nu inmax,  is  required  to  achieve  maximum 
throughput? 

Nwinmax  ■ n+l  * approximately  Tnet/Tlocal 
This  corresponds  to  the  intu  tive  approach  of  "keeping  the  pipe 
full"  betueen  sender  and  receiver  to  achieve  maximum  throughput. 
For  roundtrip  delay  distributions  uith  larger  variance,  a 
slightly  larger  window  is  necessary. 

2)  How  smal  I a window  should  be  used  to  limit  throughput  to  a 
particular  value?  Throughput  rises  approximately  linearly  with 
window  size  up  to  Nwinmax.  Thus  to  limit  the  sender  to  0.1 
nominal  bandwidth  B,  the  receiver  should  select  a window  size  of 
approximately  0.1 'Nwinmax. 
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As  remarked  above,  the  queuing  model  of  flew  conlrol 
does  not  explicitly  consider  errors,  retransmission,  or 

overhead.  Each  packet  transmitted  carries  an  overhead  OH  (cf 
section  3).  Furthermore,  each  customer  served  by  server  1 
actually  consists  of  the  original  transmission  followed  by 
Ntrans-1  retransmissions.  These  retransmissions  are  not 
included  in  the  utilization  above,  but  must  be  included  in 
determining  the  I i mi  ting  performance.  Assuming  retrar.smi  sslons 
at  their  assigned  interval  R take  precedence  over  new 

transmissions,  the  total  traffic  generated  equals  UT*Ntrans  which 
cannot  exceed  unity.  Henra  a smaller  window  size 
Nwin  - Nwinmax/Ntrans  will  generate  the  maximum  allowable 
traf f ic. 


Uhen  UT,  the  rate  of  new  data  transmission,  is  small 
because  of  window  size  limitations  , the  "extra"  bandwidth  is 
available  for  any  retransmissions  necessary  and  throughput  is 
flow  control  limited.  When  UT  approaches  one  due  to  a large 
window  size,  retransmissions  take  precedence  over  new 
transmissions  that  might  otherwise  be  allowed,  and  throughput  is 
retransmission  limited.  Achievable  throughput  with  both 
retransmission  and  flow  control  effects  is  the  mini  mum  allowed 
by  either  effect.  Combining  these  results  with  equation  9, 
maximum  average  throughput  attainable  becomes; 
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I 

I TPmax  « TPoh  • tnindJT,  TPretrans)  • B (11) 

Another  way  to  interpret  the  limit  Nwin  is  as  a source 
• imposed  resource  limitation.  Since  each  packat  transmitted  must 

be  stored  at  the  source  until  an  ACK  returns,  buffer  space  may 
becofne  scarce  when  many  connections  become  active.  In  sharing 
its  processing  and  storage  capacity  among  many  connections,  a 
protocol  may  limit  the  portion  devoted  to  each  connection  and 
reduce  the  window  size  per  connection  below  what  might  be 
allowed  by  the  receiver.  For  example,  an  ARPANET  TIP 
t0rnstein72]  limits  each  normal  terminal  user  to  a window  of 
G-12  characters  in  order  to  share  its  relatively  scarce  buffer 
space  among  all  users. 

An  alternative  interpretation  of  the  queuing  model  Is  to 
consider  the  number  of  type  2 servers  limited  to  N,  but 
unlimited  customers,  rather  than  limiting  the  total  number  of 
customers  in  the  system  and  having  infinite  type  2 servers. 

This  interpretation  models  the  situation  where  network  capacity 
is  the  blocking  factor— once  the  Host  has  transmitted  N packets, 
the  network  blocks  further  transmission  until  new  permission  to 
send  is  returned.  This  was  precisely  the  situation  in  a recent 
ARPANET  congestion  control  strategy:  Source  and  destination 

IMPs  imposed  a fixed  window  size  of  four  messages  tflcQu i I I an723 
for  traffic  between  each  pair  of  Hosts. 

li 
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Flow  control  constraints  can  affect  total  transmission 
as  well  as  throughput  since  a transir ' ssion  request  may 
have  to  wait  until  window  space  is  available.  However,  this 
waiting  time  is  the  type  of  throughput  dependent  delay  mentioned 
in  the  introduction  to  this  chapter  and  will  not  be  considered 
further.  Cochi  (1973)  has  treated  increased  waiting  time  in 
systems  of  queues  under  similar  circumstances  where  a server  is 
blocked  from  further  processing  because  the  next  service 
faci I i ty  is  full. 
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5.  DESTINATION  BUFFER  ALLOCATION 

The  previous  section  considered  the  impact  of  source 
buffering  constraints  on  protocol  performance.  In  this  and  the 
following  sections,  we  consider  destination  buffering 
requirements  and  the  performahce  degradation  resulting  from 
I imi ted  destination  buffer  space.  Many  authors  have  discussed 
storage  allocation  for  related  communication  problems  tChu74] 
such  as  switching  node  buffer  requirements  IHcQu i I I an74, 
Closs73,  Fultz72,  Danthine75c3  and  terminal  data  buffering 
[Gaver71,  Metcalfe73].  These  analyses  often  assume  constant 
transmission  delays  over  simple  transmission  lines.  The 
following  analysis  focuses  on  dest inat  ion  buffering  strategies 
for  end-to-end  protocols  in  a PSN  with  highly  variable 
transmission  characteristics  as  discussed  in  chapter  I, 

Destination  buffer  or  storage  allocation  policy  is 
closely  connected  with  flow  control.  In  particular,  it  is  often 
assumed  that  the  window  size  and  the  buffer  space  allocated  for 
receiving  packets  must  be  identical.  In  fact,  under  Ideal 
circumstances  when  packets  arrive  uniformly  spaced  and  in  order, 
double  buffering  is  adequate  to  handle  an  arbl  trarl  ly  Is^rge 
window  size  and  accompanying  large  throughput.  Under  these 
conditions,  the  process  can  consume  and  return  one  buffer  while 
the  other  Is  being  filled  with  an  arriving  packet.  Real 
situations  probably  require  something  between  the  extremes  of 
minimal  and  full  buffering. 
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Larger  storage  allocation  at  the  destination  typically 

becomes  necessary  for  two  purposes: 

1)  Smoothing  uneven  production  and  consumption  rates.  This 
frequently  occurs  In  multiprogramming  systems  where  process 
activity  occurs  in  bursts. 

2)  Reordering  packets  arriving  out  of  order.  A sequencing 
protocol  must  deliver  packets  in  sequence,  so  early  arrivals 
must  be  held  until  their  predecessors  arrive.  Any  fragments 
created  between  source  and  destination  must  also  be 

reassembled  before  delivery.  This  is  discussed  In  the  next 
sect i on. 

Inadequate  buffer  space  for  either  purpose  results  in 
correctly  received  packets  being  discarded  because  there  is  no 
place  to  put  them.  Throwing  this  effect  into  the  loss  factor  LS 
in  the  delay  distribution  F(t)  confuses  all  the  earlier  results 
which  depend  on  F(t).  It  is  more  illuminating  to  preserve  the 
assumption  of  guaranteed  acceptance  at  the  destination  for 
previous  results,  and  introduce  an  independent  throughput 
degradation  factor,  TPbuf,  for  destination  buffer  allocation 
effects. 
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■^knouledqewent  and  Buffer  Allnra^.n^,  cip. 

A=  observed  in  the  lest  section,  return  of  an 

Acknouledgeeent  to  indicate  successfui  receipt  of  a packet  and 
suppress  retranseission  need  net  be  tied  to  return  of  credit,  or 
permission  to  advance  the  uinebu  aliouing  neu  transmissions.  ,n 
*he  simpie  impiementations  treated  in  .action  4 
Acknouledgements  and  credits  must  be  returned  together! 

-‘‘urning  credits  untli  neu 
receive  buffer  space  has  actually  been  made  available  by 

"consuming-  arrived  packets  or  furnishing  neu  .pace 

This  policy  increases  roundtrip  delay  (uhich 

■nciudes  destination  processing  time)  and  hence  may  reduce 
throughput  for  a given  uindou  size  as  shoun  In  section  4.  Of 
course  this  throughput  limitation  may  be  precisely  uhat  the 

receiver  desires  in  matching  the  source's  production  rate  to  hi. 
own  consumption  rate. 

Unfortunately,  increasing  roundtrip  delay  for 

Acknouledgements  also  results  in  more  retranemleeions,  higher 
and  less  efficient  ' i™  use  for  a given  retransmieelon 

interval  R (cf  section  3).  fl  could  be  increased  to 

compensative,  but  this  ^creases  delay  uhen  packets  are  lost 

An  elternative  strategy  involves  returning  Acknouledgements 
innediately  on  successful  receipt  of  a packet,  and  flou  control 
information  at  a possibly  later  time.  ,n  this  case,  the  shorter 
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roundtrip  delau  for  Acknowledgements  is  used  in  calculating  an 
appropriate  R,  while  the  longer  roundtrip  time  for  credits 
governs  throughput  due  to  flow  control.  Overhead  may  increase 
when  control  information  is  returned  separately  for  error 
recovery  (Acknowledgements)  and  flow  control  (credits) 
[Kleinrock74] . 

This  immediate  acknowledgement  strategy  may  reduce 
retransmissions,  freeing  transmission  medium  capacity  for  other 
users,  but  it  does  not  alter  throughput  between  its  own  users 
which  is  limited  by  the  roundtrip  time  for  credits. 
Acknowledgements  also  have  a somewhat  different  meaning  more 
like  "received"  than  "processed"  in  this  scheme. 

Both  of  the  above  conservative  strategies  guarantee  that 
an  arriving  packet  will  never  have  to  be  discarded  for  lack  of 
buffer  space  since  credit  for  a transmission  is  only  granted 
when  space  actual  I y becomes  available.  Unfortunately,  since 
storage  space  promised  must  really  be  available  at  the 
destination,  smaller  window  size  per  connection  may  be  required. 
Roundtrip  time  (at  least  for  flow  control  credits)  is  also 
increased  because  destination  processing  of  the  data  is 
included.  Both  these  effects  may  reduce  throughput. 

An  opt i mi st i c buffer  allocation  policy  may  allow  higher 
throughput  by  returning  a window  size  larger  than  the  buffer 
space  actually  available.  As  long  as  transmission  and 
consumption  procede  "smoothly,"  the  receiver  can  provide  less 
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buffer  space  than  the  window  size  returned  without  having  to 
discard  packets.  This  essentially  uses  the  storage  space  In  the 
network  transmission  path  to  provide  the  remaining  space.  The 
difficulty  in  this  scheme  is  that  promised  space  may  not 
actually  be  available  when  new  transmissions  arrive.  Some 
packets  must  then  be  discarded  and  subsequently  retransmi tted. 


)timistic  Buffer  Allocation  Stratec 


Evaluating  the  performance  of  the  optimistic  strategy 
involves  determining  the  fraction  of  successfully  arriving 
packets  that  will  be  discarded  given  the  buffer  space  available. 
Another  simple  queuing  model  serves  this  purpose.  Figure  20 
shows  a queue  size  limited  single  server  system.  Let  the  queue 
size,  Nbuf,  be  the  buffer  space  available,  and  the  mean  service 
rate,  u,  be  the  consumption  rate  of  the  receiving  process.  The 
mean  arrival  rate.  A,  equals  the  transmission  rate  of  new 
packets  allowed  by  flow  control  and  retransmission  constraints. 
Ue  wish  to  find  Pfull,  the  probability  that  all  Nbuf  buffers  are 
full.  In  the  steady  state,  this  is  the  probability  that  an 
arriving  packet  finds  the  queue  fuh  and  hence  also  the  fraction 
of  all  arriving  packets  that  must  be  discarded. 

Assuming  exponentially  distributed  interarrival  and 
service  times  and  defining  RHO  « A/u,  Pfull  Is  given  by 
[K I e i nrock75] : 


I 


Probabiiity  PFuJJ 
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Nbuf 

Pful  I (Nbuf.RHO)  - (1-RHO)  • RHP qP) 

Nbuf+1 

1 - RHO 

Figure  21  shous  Pful  I versus  RHO  for  various  values  of  Nbuf  In 
this  tt/H/1  system.  Several  observations  can  be  made: 

For  RH0»1,  Pful  I approaches  (RHO-D/RHO,  Indicating 
that  most  arriving  packets  ulll  be  discarded  regardless  of  the 
size  of  Nbuf.  In  this  case,  the  sender  is  transmitting  packets 
at  a rate  much  greater  than  the  receiver  can  accept  them,  and 
throughput  is  receiver  rate  limited.  By  acknowledging  packets 
successfully  received  (but  not  yet  processed),  retransmissions 
can  be  reduced.  This  reduces  line  utilization  but  still  leaves 
the  receiver  with  all  buffers  full  regardless  of  the  size  of 
Nbuf.  A better  solution  Involves  reducing  the  window  size  (and 
Nbuf)  to  limit  the  sender’s  transmission  rate. 

f 

For  RH0«1,  figure  21  shows  that  Pful  I approaches  zero. 
V^ry  little  buffering  Is  necessaru  for  a fast  j g 

situation  is  typical  of  a fast  process  serving  many  slower 

sources,  and  buffer  pooling  techniques  may  be  advantageous 
IChu74] . 


Pful  1-1/ (Nbuf+1)  In  the  n/n/1  system.  For 
more  realistic  distributions  with  smaller  variances,  Pful I drops 
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more  quickly* as  the  number  of  buffers  Increases.  As  remarked  at 
the  beginning  of  this  section,  nearly  constant  Interarrival  and 
processing  times  reduce  Pfull  to  zero  for  smal I numbers  of 
buffers.  Houever,  periodic  scheduling  in  multiprogramming 
systems  causes  essential  ly  bulk  arrivals  and  processing:  When 
the  sender  is  scheduled,  a burst  of  traffic  is  generated  which 
accumulates  at  the  destination  until  the  receiver  is  scheduled. 
If  scheduling  Intervals  are  large  compared  to  roundtrip  times. 
increased  window  size  and  buffer  a I location  mau  be  nscessaru  for 
high  throughout. 


5.3  Results 

The  preceding  analysis  determines  the  additional 
throughput  degradation  TPbuf  due  to  destination  buffer 
limitations.  With  conservative  strategies,  no  packets  are 
discarded  due  to  lack  of  buffer  space  since  credits  are  only 
returned  when  space  is  actually  available.  However,  flow 
control  al  locations  are  directly  tied  to  buffer  availability, 
possibly  resulting  In  smaller  window  size  and  longer  roundtrip 
times  which  both  reduce  achiev^le  throughput  (cf  section  4). 

With  optimistic  strategies,  window  sizes  larger  than  the 
available  buffer  space  are  allowed,  increasing  achievable 
throughput  or  reducing  buffer  storage  required.  However,  some 
arriving  packets  may  have  to  be  discarded  If  promised  space  is 
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not  actually  available.  Throughput  degradation  resulting  from 
overopt  i Ini  sm  aqua  Is: 


TPbuf(Nbuf)  * 1-Pful 


(13) 


Pfull  depends  on  the  ratio  and  smoothness  of  production  and 
processing  rates  as  uel  I as  on  the  buffer  space  available.  Uhen 
process  scheduling  delays  are  significant  (compared  to  roundtrip 
times)  as  in  multiprogramming  systems,  the  conservative  strategy 
with  guaranteed  buffer  availability  may  be  necessary  to  reduce 
discard  rates  and  hence  retransmission  cost  to  an  acceptable 
I eve  I . 

Combining  equation  13  with  results  from  previous 
sections,  the  maximum  achievable  throughput  for  PAR  protocols 
including  the  effects  of  overhead,  retransmission,  flow  control, 
and  destination  buffer  storage  limitations  Is: 

TPmax  - TPoh  • mindPretrans,  UT)  • TPbuf  • B (14) 

Buffering  for  rate  smoothing  purposes  is  an  example  of 
the  often  discussed  producer-consumer  problem  [Dijkstra68, 
Coffman73].  In  the  distributed  environment  of  computer  network 
communication,  the  producer  and  consumer  may  be  more  loosely 
coupled  than  In  centralized  systems.  Using  the  conservat  I ve 
strategy  causes  the  sender  (producer)  to  be  blocked  from  new 
transmissions  when  the  allocated  space  is  full  (the  ncrmal 
situation  In  tightly  coupled  central ized  systems).  Using  the 
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optimistic  policy  al  lous  the  sender  to  transmit  neu  packets 
uhich  the  receiver  may  have  to  discard  if  all  buffers  are  full 
(not  normally  alloued  in  centralized  systems). 
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6.  SEQUENCING 

The  qualitative  performance  goal  of  del  ivering  packets 

in  order  is  relatively  easy  to  implement  as  described  in  chapter 
II  by  including  a sequence  number  in  the  header  of  each  packet 
(SPAR  protocol).  However,  introducing  sequencing  s i gni  f I cant  I y 
complicates  quantitative  performance  analysis  because  delay  for 
neighboring  transmissions  can  no  longer  be  assumed  Independent. 

In  particular,  a packet  may  arrive  successfully  at  its 
destination  before  one  of  its  predecessors  because  the 
transmission  medium  does  not  always  deliver  packets  in  the  order 
submitted  (cf  section  1-2).  Briefly,  this  is  due  to  alternate 
routing  arjd  line  errors  followed  by  retransmissions  within  a 
PSN,  or  to  damage  or  complete  loss  of  an  earlier  packet.  If  the 
protocol  allows  packets  to  be  transmitted  in  excess  of  buffer 
space  available  at  the  destination  (cf  section  5),  packets 
arriving  out  of  order  may  have  to  be  discarded,  requiring 
retransmission  and  degrading  throughput  as  we  show  In  section 
B.2.  E.en  when  buffer  space  is  available,  the  packet  is  not 
accepted  (delivered  to  the  receiving  process)  or  acknowledged 
until  all  its  predecessors  have  successfully  arrived.  This 
causes  an  explicit  dependence  among  the  transmission  delays  for 
neighboring  packets. 
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8.1  Increased  Roundtrip  Delau 

Ue  can  derive  a rough  estimate  of  increased  delay  when 
sequencing  is  required  by  using  basic  probabi I i ty  arguments 
similar  to  those  used  in  deriving  equations  4 and  5.  In  this 
case,  ue  wish  to  find  the  time  delay  distribution  for  a packet 
and  all  its  predecessors  to  have  arrived. 

For  purposes  of  analysis,  a conceptual  change  in  the 
acknowledgement  strategy  proves  expedient.  Normally  an 
Acknowledgement  is  returned  for  an  arriving  packet  only  after 
all  predecessors  have  arrived.  Instead,  suppose  an 
Acknowledgement  is  returned  immediately  for  all  successfully 
received  packets,  regardless  of  arrival  sequence,  but  the 
Acknowledgement  is  not  accepted  ("believed")  until 
Acknowledgements  for  all  previous  packets  have  arrived.  The 
roundtrip  time  from  first  transmission  to  accepting  an 
Acknowledgement  is  the  same  for  both  schemes.  For  the  second 
scheme,  G(t)  from  equation  4 already  gives  the  roundtrip  delay 
time  distribution  for  each  Acknowledgement  to  return.  Only  at 
the  last  stage  in  the  analysis  does  the  dependence  on  previous 
transmissions  come  into  play. 

Let  Tint  be  the  (fixed)  interval  between  transmission  of 
sequential  packets  (ignoring  retransmissions).  Let  H(t)  be  the 
desired  roundtrip  delay  cumulative  distribution  including  the 
additional  delay  incurred  when  packets  arrive  out  of  order  and 
must  wait  for  some  missing  predecessors  to  arrive. 
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H(t)  « ProbiACKs  for  packet  i and  a I I predecessors 
have  arrived  by  tine  t) 

* ProbiACK  i arrives  by  tine  t) 

• Prob (ACK  i-1  arrives  by  tine  t) 

• Prob  (ACK  i-2  arrives  by  tine  t)  • ... 

■ G(t)  • G(t+Tint)  • G(t+2*Tint)  • ... 

90 

* n G(t+j*Tint)  (15) 

j-0 

Using  equation  L ue  can  innediately  write  the  mean  delau 
including  sequencing.  DLseq  as: 

DLseq(R,F, Tint)  = fj"  tl-H(t)]  dt  (IB) 

Figure  22  shows  DLseq  as  a function  of  the 
retransni ssion  interval  R for  several  packet  loss  probabilities 
LS  and  transni ssion  intervals  Tint.  Delay  without  sequencing  is 
shown  dotted  for  conparison.  The  underlying  transni  ssion  nediun 
delay,  f(t),  is  the  Erlangian  distribution  with  nean-1  and 
degree  k=lB  from  section  3.3.  The  spread  of  f(t)  represents 
variation  in  transmission  medium  delay  due  to  alternate  routing 
and  PSN  internal  error  recovery.  The  loss  factor  and 
retransmission  interval  account  for  lost  or  damaged  packets  and 
end-to-end  error  recovery. 


Mean  Delay  DLseq 


FIGURE  22  MEAN  DELAY  INCLUDING  SEQUENCING  DLsea  vs  RFTttAN«?MK»Jinv 

interval  r for  sequencing  protocol  '^‘•^^ansmission 
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LS  - PROBABILITV  OF  PACKET  LOSS 


Tmt  - TRANSMISSION  INTERVAL  OF  NEW  PACKETS 

TRANWISSION  MEDIIJM  DELAV  DISTRIBUTION  lltl  IS 
IS  ERLANGIAN  Imeim  1 |«  •»  16) 
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I I 

Resu I ts; 

Small  transmission  intervals  Tint  (rapid  transmission 
rates)  increase  delay  required  to  resequence  packets  since  it  is 
more  likely  that  closely  spaced  packets  ui  1 1 arrive  out  of 
order.  For  I ou  packet  loss  probabi  I i t ies  (LS«1),  the  increase 
is  small  for  typical  f(t)  with  lou  variance.  However,  as 
network  traffic  increases,  the  variance  of  transmission  time 
also  increases  (Naylor73],  and  resequencing  delays  may  become 
more  significant. 

For  larger  LS,  a significant  fraction  of  packets  are 
delayed  by  one  or  more  retransmission  intervals  R.  When 
sequencing  is  required,  this  affects  the  preceding  packet  delays 
as  well,  ampi  I fying  the  increase  of  delay  with  R noted  in 
section  3 by  a large  factor  depending  on  Tint. 

Reducing  Retransmission  Rate; 

Uhen  a packet  is  damaged  or  lost  in  a sequencing 
protocol,  not  only  the  Acknowledgement  for  that  packet,  but  also 
for  all  subsequent  packets  within  the  window  size  Nwin  will  be 
delayed  by  at  least  a retransmission  time  R.  Hence  it  is  likely 
that  the  retransmission  time-outs  for  all  the  other  pending 
packets  in  the  window  will  also  expire,  and  all  Nwin  packets 
will  be  retransmitted  following  the  faulty  one.  Some  protocols 
attempt  to  avoid  this  amplification  of  retransmissions  by 
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suppressing  retransmission  of  all  but  the  first  timed-out 


packet. 


Such  single  packet  retransmission  protocols  then  operate 


in  a bimodal  frishion:  In  "normal"  mode  (no  retransmission 
time-outs  have  occurred),  they  transmit  uith  uindou  size  Nuin 
and  throughput  derived  in  section  4 above.  In  "error"  mode, 
they  retransmit  only  the  single  timed-out  packet  at  intervals  R 
until  successful.  This  policy  keeps  transmission  medium 
banduidth,  retransmissions,  and  hence  cost  to  a minimum  (since 
Acknouledgements  for  the  other  successfully  received  packets  can 
return  before  they  are  retransmitted).  But  delay  is  increased 
and  throughput  decreased  since  one  of  the  other  (suspended) 
packets  may  also  have  been  lost  or  damaged.  For  realistic  loss 
probabilities  (LS«1),  the  savings  and  performance  degradation 
are  minimal,  but  the  protocol  implementation  may  be 
significantly  simpler  uith  such  a single  packet  retransmission 
strategy. 

Returning  negative  acknouledgements  (NACKs)  for  damaged 
packets  provides  another  uay  to  reduce  retransmission  costs  and 
delay.  The  NACK  can  stimulate  immediate  retransmission  of  the 
damaged  packet  before  its  normal  retransmission  time  uould 
occur.  If  the  second  transmission  is  successful,  a positive 
acknou I edgement  of  the  missing  packet  and  its  successfully 
received  successors  may  reach  the  sender  before  a full  uindou  of 
packets  has  been  retransmitted.  Although  the  protocol  cannot 
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rely  on  NACKs  for  reliability  (cf 
provide  an  improvement  In  efficiency 
eignlficant. 


section  II-l),  NACKe  may 
uhen  damage  rates  are 


6.2  Pi  scard  Probabi I i tu 

As  shown  above,  protocol  performance  deteriorates  when 
sequencing  is  required  even  If  there  is  enough  destination 
buffer  space  to  hold  all  out  of  order  arrivals.  In  this  section 
we  examine  the  additional  degradation  resulting  uhen  out  of 
order  arrivals  must  be  discarded  due  to  insufficient  buffer 
space. 

Along  the  lines  of  section  5,  we  wish  to  find  the 
probability  that  an  arriving  packet  must  be  discarded,  Pdif. 
This  results  in  throughput  degradation  by  a factor  1-Pdis 
exactly  as  in  section  5.  Pdis  is  most  easily  derived  by  first 
considering  the  probability  that  an  arriving  packet  is  in  order, 
with  the  possible  exception  of  its  n-1  most  recent  predeceissors: 

Pinord(n)  ■ Probluhen  packet  i arrives,  packets  l-n,  1-n-l, 

i-n-2,  ...  have  already  arrived) 

CE 

■ Prob (packet  i arrives  at  time  t,  and 
packets  i-n,  i-n-1,  i-n-2,  ... 
have  arrived  by  time  t)  dt 


\ 


( 


I 
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Note  that  Pinord(l)  is  the  probability  that  an  arriving 
packet  is  in  order  (i.e.  none  of  i ts  predecessors  is  missing). 
Once  again,  let  Tint  equal  the  (fixed)  interval  between 

A 

transmission  of  sequential  packets.  Let  G(t)  and  g(t)  be  one 
wau  delay  time  distributions  including  loss  probability  LS  and 

A ^ 

retransmission  at  interval  I R.  (Ue  estimate  G(t)  and  g(t)  by 
the  roundtrip  delay  distributions  G(2t)  and  g(2t).)  Then 

Pinord(n)  can  be  written: 

Pinord(n)  - g(t)  • G(t+n*Tint)  • G(t+(n+l)*Tint)  • ...  dt 

® A • 

■ / g(t)  n G(t+j*Tint)  dt  (17) 

j.n 

Finally  note  that  with  a buffer  size  of  n,  the 

probability  that  an  arriving  packet  must  be  discarded  is  just 
l-Pinord(n),  the  probability  that  at  least  one  of  the  packet’s 
predecessors  n or  more  away  has  not  arrived  yet: 

Pdis(n,Tint)  » l-Pinord(n)  (18) 

Figure  23  shows  Pd  is  as  a function  of  n for  several 
values  of  Tint  and  LS.  The  underlying  transmission  delay 
distribution  is  again  assumed  Erlangian  with  mean-1  and  degree 
16.  R is  1.5  in  the  optimal  range  determined  in  section  3. 
Closely  spaced  transmissions  increase  the  likelihood  of  out  of 
order  arrival  and  the  di  scard  probabi  I i ty.  For  small  LS,  only 
alternate  routing  and  internal  PSN  error  recovery  (single  hop 
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retransmission)  contribute  to  out  of  order  arrival  rates,  and  a 
moderate  number  of  buffers  suffices  for  nearly  all  resequencing 
needs.  For  higher  LS,  longer  end-end  retransmission  delays 
contribute  significantly  to  less  ordered  arrivals,  and  more 
buffering  is  required  to  avoid  discarding  packets.  Except  for 
very  long  transmission  Intervals  (low  throughput)  or  low 
variance  transmission  medium  delay  distributions  with  small  LS, 
a significant  fraction  of  packets  are  discarded  with  the 
"simple"  sequencing  protocol  strategy  of  accepting  only  the  next 
packet  in  order. 

Equation  18  and  figure  23  are  based  on  a simple  analytic 
model  of  transmission  medium  delay  characteristics.  Although 
this  model  proves  adequate  for  mean  throughput  and  delay 
analyses  In  previous  sections,  the  relative  arrival  sequence  of 
packets  discussed  in  this  section  is  much  more  sensitive  to 
small  correlations  in  delay  of  neighboring  packets,  and  the 
exact  shape  of  the  delay  distributions  that  occur  in  real  PSN. 
Therefor  the  values  shown  in  figures  22  and  23  are  more 
representative  of  the  shape  of  effects  to  be  expected  than  their 
exact  values. 

In  fact  Pinord(n)  is  the  sort  of  performance  measure 
that  proves  exceptionally  difficult  to  derive  exactly  for  a 
detaileu  PSN  model.  Even  the  mean  interpacket  arrival  times 
under  the  assumptions  of  a fixed  path  and  no  reordering  proved  a 
formidable  problem  [Fultz721.  Fortunately  Pi nord(n)  is  easily 
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obtained  empiricaily  by  comparing  arriving  packet  sequence 
numbers  with  the  current  expected  sequence  number  (ESN),  and 
tabulating  a histogram  of  differences.  Such  information  has  not 
been  recorded  until  recently  by  Forgle  who  derived  a simpler 
global  out  of  order  percentage  from  his  data  [Forgie75]. 
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7.  PACKET  SIZE 

Although  discussed  last,  packet  size  selected  by  the 
protocol  by  no  means  has  the  least  impact  on  performance.  The 
primary  result  of  varying  packet  size  is  to  vary  the  basic 
transmission  medium  delay  f(t).  Transmission  delay  'n  a PSN  has 
a large  component  proportional  to  packet  length  because  the 
transmission  time  on  each  hop  (betueen  switching  nodes)  is  equal 
to  packet  length  divided  by  bandwidth  [Metcal fe73) . Therefore 
shorter  packets  mean  lower  per  packet  delay,  with  ensuing 
effects  on  retransmission,  flow  control,  and  buffering.  Large 
packets  are  also  undesirable  because  they  require  a bigger  share 
of  protocol  buffer  space  and  a larger  slice  of  available 
transmission  bandwidth,  raising  the  question  of  fairness  ••to 
other  processes  sharing  thsse  resources. 

The  main  counterforce  to  sending  short  packets  is  the 
increase  in  overhead.  Since  header  and  control  information  is 
normally  fixed  length,  a larger  portion  of  available  bandwidth 
is  taken  up  with  overhead  as  packet  size  shrinks  tKleinrock74] , 
Furthermore,  larger  processing  overhead  and  space  for  associated 
linkage  and  state  information  is  required  by  the  protocol  for 
each  data  bit  transferred.  Haximum  throughput  attainable 
decreases  with  shorter  packet  lengths,  and  cost  increases  since 
the  number  of  packets  or  total  bits  required  to  transmit  a given 
amount  of  data  increases. 
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Let  a letter  be  an  independently  meaningful  chunk  of 
data,  i.e.  the  amount  of  data  necessary  for  the  receiver  to 
begin  processing.  This  may  vary  from  a single  character  for 
some  text  editing  applications,  to  a large  file  for  a compiler. 
The  total  delay  to  transmit  a large  letter  first  drops  uith 
decreasing  packet  size  because  of  louer  per  packet  delay,  but 
then  begins  to  rise  again  uith  shorter  packets  because  of 
increased  overhead.  Figure  24  illustrates  this  for  a 
representative  set  of  parameters  but  no  packet  loss  (LS-0). 

The  probability  that  transmission  errors  ui I I occur  also 
rises  uith  packet  length,  giving  an  upper  limit  to  achievable 
throughput  and  an  optimal  packet  size  for  maximum  throughput  as 
shoun  in  [Metcal fe731 . 

Hence  there  is  no  single  optimal  packet  size  for  an 
interprocess  communication  protocol  to  select  in  all  cases. 
Rather,  optimal  packet  size  depends  on  the  balance  of  user 
requirements  for  delay,  throughput,  cost,  and  letter  length 
[0pderbeck74] . Short  data  transmissions  uhere  each  transaction 
may  be  independently  processed  can  take  advantage  of  reduced 
delay  in  using  shorter  packets,  but  incurring  higher  costs  and 
louer  throughput.  Real-time  traffic  requiring  moderate  delay 
and  good  throughput  for  moderate  letter  sizes  should  use 
moderate  packet  sizes.  Minimum  cost  or  optimal  throughput  users 
uilling  to  tolerate  longer  average  delay  should  use  longer 
parkets.  As  transmission  medium  banduidths  rise  and  error  rates 
drop,  the  impact  of  packet  size  ui  I I be  lessened. 


FIGURE  24  TOTAL  DELAY  vs.  PACKET  LENGTH  FOR  VARIOUS  LETTER  SIZES 


NETWORK  BANDWIDTH  • 50  kb/«ec 
NUMBER  OF  HOPS  * 5 
HEADER  LENGTH  * ?00  3ITS 
PACKET  LOSS  PROBABILITY  - 0 
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Packet  switching  networks  have  their  own  similar  reasons 
for  selecting  packet  sizes  for  internal  transmission.  The  units 
of  information  accepted  from  network  users  (e. g.  ARPANET 
messages)  may  be  larger  or  smaller  than  these  Internal  packet 
sizes.  The  ARPANET  offers  a larger  input  limit  (8  kbit 
messages),  choosing  to  fragment  large  user  submissions  into 
smaller  packets  for  internal  transmission  to  reduce  delay  and 
switching  nods  buffer  requirements  tCrowther75J . TYMNET,  on  the 
other  hand,  collects  short  user  inputs  into  larger  packets  for 
internal  transmission  to  reduce  overhead  [Tymes711 . 

Although  an  interprocess  communication  protocol  in 
general  has  no  control  over  such  internal  PSN  transmission 
decisions,  an  awareness  of  transmission  characteristics  is 
fundamental  to  efficient  protocol  operation  as  we  have  seen. 
For  example,  delay  is  not  linear  yith  packet  (message)  sizes 
above  1 kbit  in  the  ARPANET  due  to  the  internal  fragmentation  of 
larger  submissions  mentioned  above.  Hence  use  of  larger  packet 
sizes  by  an  interprocess  communication  protocol  on  the  ARPANET 
becomes  more  attractive. 
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Chapter  IV 

NETWORK  INTERCONNECTION 


1.  INTRODUCTION 

As  computer  networks  proliferate,  the  performance  of 
interprocess  communication  protocols  will  receive  increasing 
attention.  Chapters  II  and  III  provide  a basis  for  the  design 
and  analysis  of  protocols  to  meet  particular  reliability  or 
efficiency  performance  goals. 

Another  set  of  questions  quickly  growing  in  Importance 
concerns  the  interconnect  I on  of  computer  networks.  Networks 
are  already  developing  on  the  basis  of  geographical  coverage, 
particular  types  of  service  offered,  and  organizational 
coverage.  Users  desiring  access  to  multiple  areas,  services,  or 
organizations  will  need  to  communicate  over  many  of  these 
networks  as  easily  as  possible.  The  general  computing  power  and 
unique  computing  resources  available  on  different  networks  can 
most  efficiently  be  made  available  to  people  and  computers 
attached  to  other  retuorks  by  interconnection  of  networks. 

We  shall  adopt  the  term  Gatewau  for  the  interface 
between  interconnected  networks  as  used  in  IFIP  Working  Group 

(1)  IFIP  is  the  International  Federation  for  Information 
Processing.  Working  Group  6.1  is  the  Internetwork  Working 
Group. 
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6.1  (1)  (INUG)  IPouzin73,  Lloyd75a,  Cerf73J . As  a first  model 
of  network  interconnection,  we  can  consider  the  Gateways  as 
"supernodes"  in  a "supernetwork."  Individual  local  nets  are 
just  the  "lines"  that  connect  Hosts  to  supernodes,  and 
supernodes  to  each  other.  The  unusual  aspect  of  this 
supernetwork  is  that  each  line  may  require  a different 
communication  protocol,  so  the  supernodes  must  implement  the 
correct  local  net  protocol  for  the  nets  they  interface.  Even 
circuit  switched  nets  could  serve  as  lines  in  the  supernetwork, 
although  much  of  the  following  discussion  applies  most  directly 
to  packet  switching  networks  (PSN’s).  Uhi le  this  supernetwork 
model  is  by  no  means  the  only  sort  of  interconnection  possible, 
it  provides  a simple  introduction  to  the  issues  Involved  In 
interconnecting  heterogeneous  computer  networks. 

Even  for  identical  networks,  such  Interconnection  is  not 
a trivial  problem.  As  a minimum,  common  addressing  techniques 
are  needed  so  any  user  on  any  of  the  interconnected  networks  can 
uniquely  specify  any  other  users.  Global  addressing  can  be 
achieved  by  expanding  the  address  space  available  on  each  local 
net  and  changing  previously  identical  local  net  addresses 
(highly  inconvenient  to  the  users  involved),  by  concatenating 
partial  path  addresses  into  a complete  path  specification,  or  by 
instituting  a hierarchical  addres?  space  (e.g.  not,  local 
address)  with  necessary  alterations  to  routing  algorithms. 
Section  2 below  considers  these  addressing  and  routing 
a I ternat i ves. 
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In  general,  communication  formats  and  protocols  may 
di  ffer  between  networks  to  be  connected.  Hence  the  primary 
function  of  a Gateway  is  to  translate  between  formats  and 
protocols  used  in  each  local  net.  This  raises  the  difficult 
problem  of  choosing  an  appropriate  level  of  Interconnection, 
discussed  in  section  3 below. 

As  part  of  thf'  translation  process.  Gateways  must  deal 
with  varying  maximum  packet  sizes  in  the  local  nets  connected. 
Uhen  a packet  arrives  that  is  too  large  for  the  next  local  net, 
the  Gateway  must  fragment  the  packet  before  forwarding  it 
through  the  next  local  net.  These  fragments  may  be  reassembled 
as  they  leave  the  next  local  net,  or  allowed  to  precede 
independently  to  their  ultimate  destination. 

Since  Gateways  are  nooes  in  the  supernetwork  formed  by 
connecting  individual  local  networks,  they  must  also  support 
typical  node-to-node  communication  functions.  Flow  control  and 
buffer  allocation  algori thms  are  necessary  to  limit  peak  loads 
and  to  share  resources  fairly.  Access  control,  accounting,  and 
performance  monitoring  are  specially  important  to  Gateway  nodes 
since  relatively  independent  local  networks  with  potentially 
sensitive  political  and  administrative  concerns  are  involved 
tkuo74,  Kuo75] . In  section  4 below  we  consider  several  of  these 
additional  Gateway  functions. 

Other  typical  node  functions  such  as  error  detection, 
duplicate  detection,  sequencing,  and  retransmission  may  be 
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performed  on  a hop-by-hop  (Gateway-Gateway)  and/or  end-end 
basis.  Some  of  these,  such  as  retransmission,  may  be  desirable 
hop-by-hop  (at  least  in  high  loss  rate  nets),  while  others  such 
as  sequencing  may  degrade  performance  if  employed  on  each  hop  as 
discussed  in  section  3. 

An  important  goal  of  network  interconnection  strategies 

is  to  require  as  little  alteration  as  possible  to  the  individual 

networks  connected.  Expressed  in  some  form  by  many  authors, 

this  goal  can  be  summed  up  in  the  following  principle: 

Local  Net  Independence  Principle:  Each  local  net  shall 

retain  its  individual  address  space,  routing  algorithms, 
packet  formats,  protocols,  traffic  controls,  fees,  and  other 
network  characteristics  to  the  greatest  extent  possible. 

Some  important  motivations  for  this  goal  are 

(1)  Local  nets  have  a large  investment  in  existing 

implementations  which  can  not  be  replaced  inexpensively. 

(2)  Most  net  traffic  will  continue  to  be  local  net  traffic  and 

it  is  unfair  for  all  users  to  suffer  the  disruption  of 

service  and  increased  cost  of  a new  implementation  that  only 
serves  a minority  of  users. 

(3)  Even  if  technically  desirable,  political,  economic,  or 
administrative  constraints  may  make  changing  to  global 
standards  impossible. 

(4)  From  a practical  viewpoint,  cooperation  will  be  more  likely 
and  completion  faster  if  fewer  changes  are  required. 
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On  the  other  hand,  some  global  agreements  are  clearly 
necessary  for  meaningful  communication,  for  example,  standard 
addressing  techniques  so  users  can  refer  to  each  other 
unambiguously,  and  common  formats  so  arriving  messages  can  be 
correctly  Interpreted.  The  goal  is  to  implement  such  standards 
on  top  of  existing  local  net  functions,  achieving  Independence 
and  universality  at  the  same  time. 

Reference  is  made  throughout  the  remainder  of  this 
section  to  several  existing  or  planned  networks  as  examples  of 
various  points  discussed.  These  nets  include  ARPANET,  CYCLADES, 
EPSS,  TYMNET,  ALOHANET,  PRNET,  LJCL,  and  DCS.  Appendix  B 
provides  a list  of  references  relevant  to  each  of  these  nets  for 
the  reader  wishing  background  information  or  to  further  explore 
the  points  raised  below. 
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2.  ROUTING  AND  ADDRFSSINin 

The  question  of  addressing,  or  how  to  name  al  i the 
participants  in  an  interconnected  communication  system,  is 
intimately  related  to  the  question  of  routing,  or  how  to  find 
paths  from  source  to  destination  and  then  choose  among  them. 
For  example,  a single  level  address  space  requires  each  node 
performing  routing  to  know  the  correct  route  to  every  possible 
destination  independently,  while  a hierarchical  address  space 
allows  routing  nodes  to  know  correct  routes  only  to  destinations 
within  the  local  "area"  and  to  other  areas  (although  such  area 
routing  may  not  be  optimal)  iricQu  i 1 1 an74] . 

A large  body  of  literature  exists  on  routing  and 
addressing  for  individual  networks  [BaranB4,  Fultz72, 
fIcQui  i ian74,  Frank71,  Farber73J,  but  only  recently  have  the 
special  problems  pertaining  to  network  interconnection  been 
addressed  (Graham71,  Farber73,  Bellcni74,  lIcQui  I lan74] . riany  of 
these  problems  stem  from  the  Local  Net  Independence  goal,  i.e. 
the  desire  to  preserve  individual  address  spaces  and  routing 
techniques  within  each  local  net.  This  favors  restricting 
internet  functions  to  Hosts  and  Gateways  rather  than 
implementing  them  within  local  nets  as  we  shall  see  below. 

A number  of  important  concepts  in  addressing  have  proved 
confusing  due  to  conflicting  use  of  terms  by  different  authors. 
Hence  before  continuing  we  present  brief  definitions  of  the 
terms  to  be  used  in  the  remainder  of  this  section. 
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PSN:  Network  of  packet  switches  that  forward  packets  of 
appropriate  format  from  a source  Host  to  a destination  Host. 
Various  additional  services  such  as  sequencing, 
retransmission,  delivery  confirmation,  etc.  may  also  be 
aval  table. 

Host:  Source  and  destination  of  packets  in  a PSN.  For  routing 
purposes,  all  packets  to  a given  Host  are  going  to  the  "same 
place"  as  far  as  the  PSN  is  concerned.  Hosts  may  be  sinalu 
connected  (to  a single  packet  switch)  or  multlolu  connected 
(to  more  than  one  packet  switch)  in  which  case  optimal 
Host-Host  routing  is  more  complicated. 

Communication  Control  Protocol  (CCP):  As  described  in  chapter 
II,  this  represents  the  end-end  protocol  which  provides 
reliable  interprocess  communication  and  multiplexes  the 
independent  communication  streams  from  many  processes  within 
a Host.  Examples  of  CCPs  are  TCP  ICerf74b,  Cerf74c],  NCP 
ICarr701 , NCAM  (Karp72],  and  Transport  Station 
IZ i mmerman731 . In  the  simplest  cases,  there  is  a one-to-one 
correspondence  between  CCP  and  Host,  and  a CCP  name  is 
synonymous  with  Its  Host  name.  (In  ARPANET  the  Host  name  is 
emphasized,  and  all  packets  to  a Host  typically  go  to  a 
single  NCP. ) 
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In.  general  CCP  nafnes  are  independent  of  Host  names. 
There  may  be  multiple  CCPs  in  a single  Host  (CYCLADES).  A 
CCP  may  "move"  from  one  Host  to  another  [Lay73]  (although 
routing  tables  must  reflect  the  altered  CCP/Host 
correspondence).  A single  CCP  may  even  encompass  several 
Hosts.  (The  multi -Host  ARPANET  instaliation  at  BBN 

approaches  this  type  of  application  ai  though  at  a higher 
protocoi  level.) 

Port:  The  ultimate  source  and  destination  of  the  communication 
path  provided  by  a CCP.  Each  psir  of  source/destination 
ports  represents  a unique  communication  path  (connection, 
association),  so  that  a single  port  may  have  multiple 
connections  to  different  remote  ports. 

Process:  Processes  represent  the  active  computing  tasks  (jobs, 
devices,  users)  in  Host  computer  systems.  Processes  in 
di  fferent  Hosts  (or  the  same  Host)  wishing  to  communicate 
with  each  other  must  first  acquire  ports  from  their  local 
CCPs.  Association  of  processes  and  ports  in  each  CCP  is 
compititely  a local  matter,  but  a number  o'  "well-known" 
ports  associated  with  particular  services  at  each  CCP  are 
usefu I . 

Gateway:  The  interface  between  local  networks.  Although 

discussed  more  fully  in  the  next  section,  it  is  important  to 
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note  that  a Gateway  may  "look  like"  a packet  switch  (follow 
PS-PS  protocol  with  local  nets),  or  a Host  (follow  Host-PS 
protocol  with  local  nets). 


2.1  Local  Net  Participation  in  Internetworking 

Having  defined  these  important  terms,  we  return  to  our 
initial  "supernetwork"  model  of  network  interconnection  to 
examine  local  net  functions  required  to  support  Internet 
communication.  Ue  assume  that  CCP  and  Host  names  are  synonymous 
(one  CCP,  fixed,  per  Host)  and  that  Hosts  are  singly  connected. 
This  encompasses  a large  percentage  of  pract leal  situations. 
Subsequent  I y we  shall  consider  complications  resulting  when 
these  constraints  are  relaxed. 

A source  CCP  (Host)  creates  an  internet  packet 
containing  data  and  a header  with  necessary  control  informat ’on 
for  efficient  and  reli^le  communication  (cf  chapter  II).  The 
header  includes  a two  level  internet  destination  address  of  tha 
form  (net,  local  address).  This  internet  packet  must  be 
delivered  through  the  CCP’s  local  net  to  a Gateway  for 
forwarding  to  the  destination  net. 

Unfortunately,  an  internet  packet  may  not  be  suitable 
for  direct  transmission  by  local  net  protocols.  Instead,  the 
entire  internet  packet  must  be  presented  as  data  to  the  local 
net  protocol,  and  wrapped  in  the  appropriate  local  net  header 
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(and  perhaps  trailer)  with  the  local  net  address  of  the  Gateway 
(see  figure  1).  This  concept  of  embedding  has  cippeared 
frequently  in  work  by  members  of  INUG  fCerf74a,  Cerf74c, 
Pouz i n73,  Pouzin74ali  At  a Gateway,  the  local  net  "envelope”  is 
removed  and  the  intern^i  packet  extracted.  The  internet 
destination  address  can  then  be  used  to  route  the  packet  to 
another  Gateway,  or  to  the  final  local  Host,  and  the  internet 
packet  is  re-embedded  for  transmission  through  the  next  local 
net. 

This  embedding  strategy  complies  fully  with  the  Local 
Net  Independence  principle  since  no  changes  at  al  I are  required 
in  local  net  addressing  or  routing.  In  fact  local  nets  are 
completely  unaware  that  they  are  carrying  internet  traffic. 

The  disadvantage  of  such  strict  adherence  to  local  net 
independence  is  that  source  Hosts  must  generate  the  local  net 
address  of  the  first  Gateway.  Then  each  Gateway  must  not  only 
choose  the  next  net,  but  also  the  specific  next  Gatewau  and 
specify  its  local  net  address.  This  local  address  (and  the  rest 
of  the  local  net  header)  must  travel  with  the  internet  packet 
through  each  local  net,  increasing  overhead.  To  eliminate  these 
disadvantages,  it  is  possible  to  alter  local  net  operation  to 
interpret  the  internet  header  and  address  directly,  avoiding  the 
need  to  embed  internet  packets.  To  this  end,  part  of  the 
internet  header  may  be  reserved  for  local  net  functions,  while 
the  remainder  is  used  by  the  CCP  and  Gateway  for  internet 
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functions.  Such  an  overlapping  packet  format  has  been  proposed 
in  recent  INUG  documents  [Cerf75,  INUG74,  INUG75al.  (see  figure 
2) 

Direct  internet  packet  interpretation  would  require  a 
substantial  change  in  established  nets  and  is  probably 
unacceptable.  To  fninimize  changes  to  Host  software,  only  an 
escape  or  type  field  may  be  added  to  the  existing  formats  and 
protocols,  so  that  local  and  internet  packets  can  be 
differentiated  and  treated  appropriately.  Of  course  new 
networks  may  be  designed  to  understand  a hierarchical  address 
format,  or  to  implement  multiple  packet  types  from  the  start. 

Even  if  local  nets  do  understand  the  internet  address, 
requiring  them  to  route  internet  packets  based  on  -^he  internet 
destination  represents  a significant  additional  burden.  When 
only  one  Gateway  exists,  routing  in  the  local  not  is  trivial. 
The  "central  office"  interconnection  model  [ricQui  I Ian74]  where 
all  internet  traffic  is  routed  to  a local  central  office,  then 
between  central  offices  of  different  nets  (by  special  trunk 
lines),  and  finally  to  a local  destination,  also  presents  simple 
local  net  routing  of  internet  traffic. 

For  reasons  of  reliability  as  well  as  efficiency, 
multiple  Gateways  (or  central  offices)  connecting  networks  are 
desirable  in  supernetworks  of  nontrivial  size  [UeberB4J . As 
soon  as  multiple  Gateway*  exist,  local  net  routing  and  internet 
routing  lose  their  independence.  The  local  net  cannot  choose 
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the  best  local  Gateway  (bj  whatever  criterion)  unless  it  knows 
the  cost  of  the  remainder  of  the  possible  routes  from  Gateways 
to  final  destination.  This  represents  another  significant 
breach  of  local  net  independence  since  local  nets  require  global 
routing  information,  and  argues  strongly  that  next  Gateway 
selection  be  performed  in  the  source  Host  and  in  Gateways, 
rather  than  in  each  local  net. 

In  loop  and  broadcast  networks,  each  packet  transmitted 
is  avai  I able  to  every  node  on  the  network,  but  the  Gateway 
selection  problem  still  arises.  Either  the  source  must  specify 
the  oarticular  local  Gateway  to  accept  the  packet  in  addition  to 
the  final  destination  address,  or  the  local  net  must  allow 
Gateway  nodes  to  capture  packets  on  the  basis  of  global 
destination  addresses.  The  latter  alternative  again  involves 
local  nets  in  internet  routing  decisions. 

Pierce  (1972)  has  proposed  b mul  ti- level  hierarchical 

loop  system  where  routing  of  internet  packets  by  network 

interface  nodes  (Pierce’s  "C  boxes")  is  particularly  simple. 
Each  interface  node  connects  exactly  two  loop  nets,  normally  at 
adjacent  levels  in  the  hierarchy.  If  a packet  in  a local  net  is 
destined  for  a different  local  net,  the  interface  passes  it  to 
the  higher  level  "regional"  net.  If  a packet  in  the  regional 
net  is  destined  for  the  attached  local  net,  the  interface  passes 
It  down  to  the  local  nei.  The  same  matching  test  is  performed 
at  interfaces  between  regional  nets  and  the  "national"  net  at 
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the  top  of  the  hierarchy.  Packets  follow  an  essentially  fixed 
path  up  and  then  down  the  hierarchy  of  nots,  although  Pierce  has 
suggested  alternate  routes  for  failure  recovery  and  direct 
connections  between  local  nets  exchanging  high  volumes  of 
traf  f i c. 

Graham  and  Pollack  (1971)  have  presented  another  system 
for  simplifying  internet  routing  in  a more  generally  connected 
(non-hierarchical ) system  of  loop  nets.  In  their  proposal, 
addresses  of  all  networks  are  carefully  constructed  so  that  the 
Hamming  distance  (1)  between  addresses  corresponds  to  the  path 
length  between  the  corresponding  nets.  Uhen  a packet  enters  a 
network  interconnection  node,  its  destination  address  is 
compared  to  the  addresses  of  the  two  nets  connected,  and  the 
packet  is  routed  to  the  net  giving  the  smallest  Hamming 
distance. 

The  drawbacks  of  this  scheme  are  the  length  of  addresses 
required  to  provide  a successful  distance  comparison,  and  the 
sensitivity  of  addresses  to  topology.  Address  length  is 
propor  t io''::,;  to  the  number  of  nets  in  the  system,  n,  rather  than 
log  n as  with  normal  addressing.  Any  change  or  addition  to 
network  topology  '-equires  a new  address  construct ' 3n,  often 
resulting  in  changes  to  many  e*i sting  addresses. 


(1)  The  Hamming  distance  between  two  binary  numbers  is  the 
number  of  bit  positions  in  which  they  differ. 
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Both  of  these  loop  Interconnection  schemes  are  aimed  at 


shortest  path  routing  and  are  not  readily  adaptable  to  provide 


routing  on  other  criteria  such  as  bandwidth,  delay,  or  cost  (cf 


section  2.2).  Routing  is  essentially  fixed  rather  than  adapting 
to  varying  network  performance,  although  some  recovery  from 


total  connection  failures  has  been  provided. 


The  need  for  Gateways  to  perform  a routing  function  is 


not  surprising,  whiie  the  need  for  source  Hosts  to  do  so  is. 


This  need  is  a direct  consequence  of  the  multiplicity  of 


Gateways  in  a local  net.  Remembering  the  analogy  between  a 
supernet  and  a local  net,  a supernetwork  with  multiple  Gateways 
parallels  a local  net  with  multiply  connected  Hosts  (Hosts 
connectad  to  more  than  one  packet  switch).  Hence  we  will  call 


such  networks  multiplu  connected  networks  and  note  that  every 
Host  in  a multiply  connected  network  is  thereby  (locally)’ 


connected  to  multiple  Gateways. 


Currently,  multiply  connected  Hosts  are  a rarity 
(impossible  on  the  ARPANET)  so  it  is  unusual  to  think  of  Hosts 
(CCPs)  making  routing  decisions  and  exchanging  routing  data. 


For  Hosts  (or  local  nets)  engaged  in  internetwork  communication. 


two  lovels  of  routing  are  required  as  observed  in  [Belloni74] 


(see  figure  3).  Internet  packets  and  addresses  are  generated  by 


the  internet  CCP  for  each  connection-  Then  the  internet  routing 


level  selects  a Gateway  based  on  the  internet  destination,  and 


attaches  the  local  net  address  of  the  Gateway.  Finally,  the 
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local  net  routing  level  selects  a packet  suitch  based  on  the 
attached  local  net  address.  The  packet  is  then  ready  for 
transmission  over  the  appropriate  line  using  the  Host-PS 
protocol.  Either  routing  level  may  be  a single  fixed  choice  if 
either  the  netuork  or  the  Host  is  singly  connected.  In  the  most 
general  case,  a single  Host  (CCP)  may  be  connected  to  multiple 
networks,  in  which  case  the  internet  routing  level  selects  among 
the  several  local  net  routing  levels. 

Uhen  multiple  CCPs  reside  in  a Host,  they  may  share  the 
routing  levels  (see  figure  3).  A CCP  generating  only  local 
traffic  requires  only  the  local  routing  level,  while  an  internet 
CCP  requires  both  levels. 


2.2  Routine  Data  Structures  and  Control  Strategies 

Having  established  the  general  outline  of  internet 
routing,  we  now  take  a closer  look  at  the  data  structures  and 
information  exchanges  necessary  to  support  various  routing 
systems.  Routing  data  structures  maintain  the  information  on 
possible  paths  and  their  relative  costs  that  is  needed  to  make 
routing  decisions.  Routing  control  strategies  define  how  this 
information  is  obtained,  updated,  and  used  to  make  routing 


dec i s i ons. 
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Data  Structures 

The  basic  data  structure  of  a routing  system  is 
typically  called  a routing  table  (RT).  RT  contains  a row  for 
each  possible  destination,  and  a column  for  each  route  leaving 
the  node.  The  entries  of  RT  are  the  "costs"  of  reaching  the 
given  destination  by  the  given  route,  including  special  entries 
meaning  the  destination  is  local  (directly  connected  by  the 
unreachable  by  a given  route.  Norma  I I y a rout  i ng 
algorithm  selects  the  route  with  minimum  cost  for  a given 
destination,  or  gives  up  (and  possibly  returi>s  an  error  message) 
if  the  destination  is  unreachable  by  any  route. 

Such  routing  tables  can  model  a large  class  of  routing 
algorithms  if  the  cost  is  suitably  defined  (Belloni74).  In 
general,  there  may  be  several  routing  objectives  (minimum  delay, 
maximum  bandwidth,  shortest  path,  minimum  charge,  avoid  certain 
nets,  etc.),  each  with  its  own  cost  function  and  RT.  Presumably 
arriving  packets  indicate  in  some  way  by  what  criterion  they 

wish  to  be  routed. 

Figure  4 presents  an  example  of  interconnected  networks, 
and  shows  a routing  ^able  for  Gateway  G1  between  nets  A and  B. 
The  five  routes  from  G1  include  local  routes  to  packet  switches 
in  nets  A and  B,  and  forwarding  routes  to  other  Gateways  G2  and 
G3.  In  this  example,  the  cost  measure  is  path  length,  resulting 
in  shortest  path  routing. 
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Dividing  the  rows  and  colufnns  of  RT  on  the  basis  of 
"local"  or  " remote"  produces  four  regions  as  shown  in  figure  4. 
Each  region  has  a characteristic  structure,  simplifying  the 
implementation  of  RT.  The  local  routes  to  each  net  are  normally 
the  best  routes  to  all  the  local  Hosts  on  that  net,  giving 
region  i of  RT  the  simple  structure  shown  in  figure  4.  Region 
II  routes  are  not  normally  used,  having  a much  higher  cost  than 
local  routes  to  the  local  Hosts,  but  if  a local  Host  becomes 
unreachable  through  its  local  net,  it  may  be  reachable  through  a 
longer  internet  route.  A local  route  can  never  be  a route  to  a 
remote  destination  (Hosts  do  not  forward  traffic),  so  region  111 
consists  of  "unreachable"  entries.  Region  IV  represents  the 
most  significant  portion  of  RT  for  internet  routing  since  the 
b*a«?t  path  to  remote  Hosts  may  go  through  various  Gateways. 

An  important  argument  in  favor  of  a hierarchical 
internet  address  space  concerns  the  size  of  routing  tables.  In 
a hierarchical  system,  each  Gateway  (or  internet  Host)  is 
constrained  to  know  only  about  routing  to  Hosts  in  its  local 
nets,  or  to  other  nets.  All  Hosts  on  a remote  net  are 
equivalent  for  routing  purposes.  Hence  RT  may  be  divided  into 
internet  and  local  routing  levels,  with  the  total  number  of  rows 
reduced  to  the  number  of  nets  (internet  level)  plus  the  number 
of  local  Hosts  (local  level)  (see  figure  5). 

Routing  with  a hierarchical  address  space  is  optimal 
from  source  to  destination  net  Gateway,  and  from  destination  net 
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F1GURE5  GA™AYIU)lm^CTABUSIWAHIEIL<\RCHK:ALAI»ESSSP^ 
(See  FIGURE  4 for  network  diagrtm) 
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Gateuay  to  destination  Host,  but  may  not  be  fully  optimal  since 
RT  cannot  distinguish  routes  on  the  basis  of  the  destination 
Hosts’s  location  uithln  the  dest inat ion  net.  For  example,  in 
routing  packets  to  Host  C3  of  figure  4,  Gateuay  G1  thinks  routes 
through  G2  and  G3  are  equally  good  because  G1  does  not  knou  the 
internal  structure  of  net  C. 

Uith  a single  level  address  space,  every  Host  requires  a 
row  in  RT.  The  total  number  of  rows  is  equal  to  the  total 
number  of  Hosts  in  all  nets  which  probably  unacceptable  for 
even  moderate  sized  supernets.  Furthermore,  each  Gateuay 
requires  information  about  routing  within  remote  networks, 
violating  the  local  net  independence  principle  and  requiring 
more  information  to  be  exchanged  by  adaptive  touting  algorithms 
(see  below).  The  main  advantage  of  a single  level  address  space 
IS  that  routing  may  be  optimal  since  full  information  is 
potentially  available. 

Another  consequence  of  hierarchical  addressing  concerns 
the  determination  of  unreachability  for  remote  destinations. 
Since  RT  contains  a single  row  for  each  net,  it  is  impossible  to 
determine  the  reachability  of  a particular  remote  Host  so  long 
as  its  net  is  still  reachable  (hcQui  I I an74] . If  the  remote  net 
has  become  partitioned,  or  the  remote  Host  has  died,  only  the 
final  Gateuay  will  have  this  information  and  be  able  to  discard 
a packet  destined  for  the  unreachable  destination  (and  possibly 
return  an  error  message). 
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Control  Strategies 

Another  important  aspect  of  the  routing  system  is  the 
means  by  which  cost  entries  are  computed  and  updated.  HcQuIllan 
(1974)  has  described  four  general  control  strategies.  In 
non-adaptive  or  deterministic  routing,  costs  are  computed  once 
and  never  changed  (or  at  least  very  rarely,  only  in  response  to 
major  system  failures).  The  other  three  classes  are  adaptive 
routing  strategies,  isolated,  distributed,  and  central ized.  In 
isolated  routing,  RT  entries  are  periodical ly  updated  only  on 
the  basis  of  traffic  behavior  observed  at  the  node.  No  routing 
information  is  exchanged  with  other  nodes  in  either  isolated  or 
deterministic  routing. 

Centralized  routing  involves  one  (or  more)  routing 
centers  which  collect  traffic  information  from  all  nodes,  and 
generate  RT  updates  for  all  nodes  (TYMNET  is  an  example). 
Finally,  distributed  routing  is  based  on  the  regular  exchange  of 
routing  data  between  adjacent  nodes  (as  in  ARPANET). 

Any  of  these  routing  strategies  could  conceivably  be 
used  for  internetwork  routing.  McQuillan  presents  an  extensive 
discussion  of  each  class  primarily  in  a single  network  context. 
Below,  we  discuss  the  30p  Meat  ion  of  these  results  to  network 
interconnect i on. 

Deterministic  routing  provides  the  simplest 
implementation,  but  is  overly  sensitive  to  failures  (fixed 
routing),  or  very  inefficient  since  it  does  not  adapt  to 
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changing  traffic  loads  and  ot^er  conditions.  A particularly 
unfortunate  situation  can  occur  with  fixed  routing  when  a 
failure  occurs  which  partitions  a local  net,  or  when  a Gateway 
fails.  The  f ixed  rout ing  algor  I thffi  declares  some  destinations 
unreachable  because  the  < f i *ed)  path  i s broken  at  some  po i nt , 
while  in  reality  the  destinctions  may  still  be  reachable  through 
another  Gateway  (see  figured).  Uith  fixed  routing,  overhead 
for  extended  connections  may  be  reduced  by  assigning  a short 
name  at  the  beginning  of  each  connection,  and  setting  up 

c_jnnect  I on — t_aM.e^  at  each  intermediate  node  on  the  (fixed) 
route.  These  tables  associate  the  correct  outgoing  route  with 
the  short  name  in  each  arr i v ing  packet  (TYflNET  TYMSAT  [Tymes71] 
or  Bell  ESS  (Ewin70J).  Packets  carry  only  the  short  connection 
name  or  logical  line  10  rather  than  the  full  destination 

address. 

Isolated  adaptive  routing  is  also  very  inefficient  since 
it  must  keep  "probing"  alternate  routes  to  detect  changes  in  net 
behavior  and  adapt  accordingly  (Bar an64).  If  the  probing  is 

slow,  then  bad  paths  will  persist  for  a long  time,  and  if  the 

probing  is  fast,  a large  portion  of  traffic  is  purposely  routed 
on  non-optima  I paths  which  may  lead  to  network  congestion 
prob I ems. 

Centralized  routing  algorithms  concentrate  the  RT  update 
calculations  at  a single  center  with  potentially  full 
information  to  compute  optimal  routes.  TYHNET  and  PRNET  use 
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centralized  routing  with  some  distributed  failure  recove'-y 
procedures  in  PRNET.  Unfortunately,  centralization  is 
accompanied  by  increased  sensitivity  to  failure,  high  processing 
requirements  that  typ i ca I I y r i se  u i th  the  third  power  of  the 
number  of  nodes  (McQui 1 1 an74) , and  higher  loading  of 
communication  lines  near  the  routing  center  with  incoming  data 
and  outgoing  updates^  Political  and  administrative 
considerations  also  make  a central  internet  routing  center 
unattractive.  (Uho  will  own  the  equipment,  pay  for  operations, 
determine  control  parameters?  Do  the  individual  nets  want  to 
trust  a central  authority?) 

Distributed  routing  algorithms  provide  the  best 
reliability  (adaptation  to  failed  components)  and  efficiency 
(Ba'^anBA,  F'^ltz72,  HcQui  I lan74J . flore  recent  I y,  Agnew  (1974) 
has  shown  that  distributed  routing  algorithms  can  determine 
optimal  routes,  and  Naylor  (1975)  has  presented  an  algorithm 
that  is  guaranteed  to  be  loop-free.  One  caveat  here  is  that 
when  the  routing  algorithm  itself  fails  (routes  incorrectly,  or 
exchanges  incorrect  routing  information),  the  effects  on  the 
rest  of  the  network  can  be  disastrous.  Very  stringent 
precautions  have  been  taken  to  prevent  such  errors  in  ARPANET 
IflPs  (McQui  I Ian74] , but  Gateways  and  internet  Hosts 
participating  in  internetwork  routing  may  not  be  so  well 
safeguarded.  Deterministic  and  isolated  routing  tend  to 
localize  the  effect  of  routing  failures  since  no  routing  data  is 
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exchanged.  Centralized  systisns  uould  presueably  be  an  order  of 
magnitude  more  reliable  than  the  individual  nodes  in  a 
distributed  system.  Hence  catastrophic  global  routing  failure 
is  most  to  be  feared  in  a distributed  internet  routing  system. 


Source  Routing 

Another  routing  strategy  that  does  not  conveniently  fit 
under  any  of  the  above  classes  is  source  routing  where  the 
source  of  internet  packets  specifies  the  complete  internet 
route.  When  the  entire  route  accompanies  each  internet  packet, 
no  routing  decisions  or  tables  are  required  at  Gateways,  but  the 
packet  format  is  complicated  ana  overhead  increases.  In 
particular,  the  packet  must  carry  a varying  number  of 
intermediate  addresses  depending  on  the  path  and  destination 
[Farber73I . This  overhead  may  be  reduced  by  setting  up  a fixed 
route  with  connection  tables  (see  above)  when  a connection  is 
estabi i shed. 

The  primary  advantage  of  source  routing  it  the 
elimination  of  complex  routing  responsibilities  from 
intermediate  nodes.  Instead,  responsibility  for  routing  falls 
on  the  source  nodes  which  must  be  able  to  construct  complete 
routes  to  any  desired  destination.  Source  routing  also 
eliminates  the  need  for  global  agreement  on  network  names,  since 
the  name  of  each  destination  becomes  equivalent  to  a path 
specification  for  reaching  the  destination  node. 
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The  connection  of  na^es  and  oat^  specifications  is 
particularly  appa'-ent  m an  addressing  sche*i<e  outlined  by 
Crocker  (personal  co«»un i cat i on).  If  Crocker ’ s proposa I , local 
netuorfc,s  a-e  revir eserced  by  a single  %u  tch  connecting  all  local 
^osts  (see  figure  7i.  in  adtiition,  so»-e  lines  ^roe  the  switch 
■^aj  go  to  other  sw  Icnes  (nets),  providing  network 

interconnect  ion.  The  path  to  any  destination  .s  the  series  of 
S'.j  tch  add’^esses  llmesl  tra.e'sed  to  "each  that  destination.  A 
local  pa*^’  IS  one  address  ong.  wh  le  paths  to  remote  Hosts 
'eouire  an  additional  address  ele-en*  ‘or  each  switch  traversed, 
"igc'-e  7 gives  se^efa  e-a«K^  es  o‘  path  spec  . ‘ i cat  i ons. 

If  each  local  net  (switcr)  ras  a globally  agreed  name, 
then  individual  Hosts  may  be  specified  by  their  net  name  and 
local  Host  niM)e",  ndependent  of  the  pathls)  available  to  reach 
the*".  However,  such  global  add"ess  agreements  are  not  necessary 
• f source  routing  is  used,  since  any  Host  -nay  still  be  addressed 
by  speci  fying  a path  to  1 1.  This  simplifies  addition  of  new 
ne*works.  or  replacnnent  of  a single  Host  by  a network,  because 
the  new  nodes  may  be  add"essed  by  aoding  one  more  address 
element  to  existing  path  specifications.  For  example,  if  the 
network  containing  Host  E is  attached  as  shown  in  figure  7,  then 
a path  from  Host  A to  Host  E »s 

Uitf.  a hierarchical  address  space  and  routing  by 
Gateways,  addition  of  a new  network  requires  global  agreement  on 
the  network  name,  and  insertion  of  a new  row  for  *hat  network  in 
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all  Gateway  routing  tables.  With  source  addressing,  only 
sources  a--^Fsing  a new  net  (switch)  need  to  know  the  fieu 
topo I ogy. 

A disadvantage  of  specifying  destinations  by  their  path 
names  is  that  the  "name"  of  a destination  depends  on  the 
location  of  the  source.  Two  different  Hosts  talking  to  the  same 
third  party  may  have  different  paths  to  and  hence  different 
names  for  the  same  destination.  This  situation  is  similar  to 
dialing  a special  prefix  , rrm  an  "incl-je"  phone  line,  or  the 
regular  3-digit  prefix  frcn  an  "outside"  phone  line  to  reach  the 
same  phone. 

Crocker’s  single  switch  network  model  applies  most 
clearly  to  locp  networks  and  other  fully  connected  nets 
(broadcast  transmission).  Farber  and  Vittal  (1973)  have 
proposed  a similar  source  routing  approach  for  interconnection 
of  multiple  DCS  type  locp  networks.  In  addition  to  specifying 
its  destination,  each  packet  normally  identifies  its  source. 
Crocker  and  Farber  have  described  similar  means  for 
progressively  converting  the  destination  path  specification  into 
a return  path  to  the  source  as  the  packet  traverses  successive 
path  elements.  Each  time  the  packet  leaves  a node  or  switch, 
the  return  address  of  the  node  is  appended  to  the  end  of  the 
path  specification.  Whenever  the  packet  reaches  an  intermediate 
destination,  the  corresponding  address  is  removed  from  the  head 
of  the  path  specification.  For  example,  figure  7 shows  seven 
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points  on  the  path  betueen  Hosts  A and  D.  The  path 
specification  at  each  point  is 

(i)  (8,5,1)  Start 

(ii)  (8,5, 1,6)  Return  address  added 

(iii)  (5,1,6)  First  destination  reached 

(iv)  (5  1,6,6)  Return  address  added 

(v)  (1,6,6)  Second  destination  reached 

(vi)  (1,6, 6, 5)  Return  address  added 

(vii)  (6,6,5)  Final  destination  reached 

Uhen  the  packet  reaches  its  final  destination,  the  resulting 
path  specification  is  the  path  back  to  the  source  in  reverse 
order.  If  the  line  between  switches  has  the  same  address  (name) 
in  both  switches,  then  the  two  address  transformations  described 
above  become  a simple  cyclic  shift  (e.g.  point  iv  to  point  vi). 
If  an  error  occurs  at  some  intermediate  point,  the  prrtially 
transformed  path  specification  may  still  be  reversed  to 

correctly  return  an  error  message  to  the  source. 

Source  routing  simplifies  routing  at  intermediate  points 
by  placing  all  responsibility  for  route  selection  at  the  source. 
Uhen  the  source  corresponds  to  a human  user  (perhaps  accessing 
remote  computing  services  from  a terminal),  the  user  establishes 
the  initial  route  using  whatever  criteria  he  desires,  and  may 
upoate  the  route  in  response  to  observed  performance. 

Unfortunately,  sources  communicating  with  many  destinations  may 
need  to  know  the  topology  and  performance  of  much  of  the 
internetwork  system  In  order  to  construct  successful  routes. 
Typically  less  information  is  available  to  evaluate  alternative 
routes,  and  changes  must  be  infrequent  (particularly  for  short 
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addressing  with  its  essentially  fixed  route),  so  non-opt  i t!ia ! 
routes  will  result.  Although  shortest  path  routing  .nay  be 
reasonably  amenable  to  source  specification,  other  routing 
criteria  such  as  bandwidth,  delay,  or  cost  may  be  highly  dynamic 
and  more  difficult  to  project  from  the  source. 

2.3  Conclusions 

As  a minimum  for  viable  network  interconnection,  at 
least  the  followi;^g  standards  must  be  accepted  by  all 
part i c i pants: 

(1)  A global  name  space.  In  a hierarchical  addressing  system, 
local  names  within  each  network  may  remain  unchanged.  With 
source  routing,  globally  agreed  names  are  convenient  but  not 
necessary, 

(2)  Common  internet  routing.  For  all  techniques  this  requires 
common  address  formats  in  internet  packets  and  specification 
of  routing  criteria  (if  more  than  one  is  available). 
Di  str  ibL=ted  rout  ing  also  involves  standard  routing  data 
exchanges  and  routing  decision  algorithms. 

In  addition  to  these  necessary  conditions,  we  make  the 
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(1)  To  preserve  local  net  Independence,  effibedding  internet 
packets  ^or  transmission  through  local  nets  with  Internet 
routing  by  Gateways  is  preferable  to  direct  local  net 
interpretation  of  internet  packets. 

(2)  Hierarchical  or  area  addressing  is  preferable  to  a single 
level  global  name  space.  The  shorter  routing  tables, 
routing  data  exchanges,  and  local  net  I ndep-endence  gained 
outweigh  the  potential  loss  in  routing  optimality. 

(3)  Internet  Hosts  as  well  as  Gateways  must  participate  in 
internet  routing  (but  see  section  3.3  below  on  internet 
service  sites). 

(4)  Fixed  internet  routing  is  too  unresponsive  to  failures  to  be 
acceptable  unless  special  failure  detection  and  recovery 
mechanisms  are  added.  Distributed  routing  is  more  robust 
and  efficient  but  -*»quires  a standard  universal 
implementation.  However,  distributed  routing  strategies  are 
subject  to  catastrophic  global  failures  when  an  Individual 
node’s  routing  process  malfunctions. 

(5)  Source  routing  is  most  appropriate  where  greater  source 
participation  in  route  selection  and  non-optima  I routing  are 
acceptable  in  order  to  simplify  routing  at  intormediate 
nodes  or  to  allow  more  general  addressing. 
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3.  LEVEL  DF  INTERCONNECTION 

Another  basic  question  of  netuork  interconnection 
concerns  the  I eve  I at  which  networks  are  to  be  connected.  A 
great  deal  of  confusion  is  apparent  in  past  discussions  of  this 
issue  because  different  authors  wean  different  things  by  the 
cowmonly  presented  alternatives,  "Host"  or  "packet  switch" 
level.  This  section  attempts  to  clarify  the  concept  of 
interconnection  level  by  identifying  three  distinct  issues  often 
confused  in  discussions  of  network  interconnection.  Alternative 
approaches  to  each  of  the  three  isssues  are  explored  in  the 
following  subsections  with  conclusions  at  the  end  of  each 
subsect i on. 

Ue  maintain  that  at  least  the  following  three  concepts 
represent  distinct  and  important  considerations  in  a coherent 
discussion  of  netuork  interconnection  level  (see  figure  8). 

Local  Net  Interface  Level 

Here  ue  consider  that  a Gateway  can  interface  to  local 
nets  either  as  a packet  switch  (employs  PS-PS  protocol  to 
communicate  with  local  net)  or  as  a Host  (employs  Host-PS 
protocol  to  communicate  with  local  net).  This  is  the  most 
common  meaning  of  interconnection  level.  In  the  packet  switch 
case,  a Gateway  must  behave  like  a norma!  switching  and 
forwarding  node  in  the  local  net,  while  in  the  Host  case,  a 
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FIGURES  EVram^JNEXTITON  LEVEL  ISSUES 


LOCAL  NET  INTERFACE  LEVEL: 

• Packet  Switch 

• Host 

LOCAL  NET  SERVICE  LEVEL: 

• Datagram 

• Virtual  Call 

• Bulk  Data  Transfer 

• Interactive  Terminal 

IMPLEMENTATION  APPROACH: 

• Endpoint 

• Hop  - by  • Hop 
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Gateway  behaves  like  a source/dest  i nat  i on  of  local  net  packets 
(see  definitions  in  previous  section).  Ue  argue  that  the  first 
alternative  is  neither  desirable  or  in  general  even  possible. 

Local  Net  Service  Level 

If  a Host  le'. el  local  net  interface  is  chosen,  a Gateway 
can  tiake  use  of  various  local  net  service  levels  for 
transmitting  packets  through  a local  net.  Levels  include  a 
simple  "datagram"  or  best  effort  service,  or  a more  reliable 
"virtual  call"  service.  Other  special  service  levels  such  as 
bulk  data  transfer  or  interactive  terminal  handling  may  also  be 
available  (see  section  3.2  below  for  definition  of  service 
I evel  s). 

Endpoint  vs.  Hop-bu-Hop  Protocol  Implementation 

A desired  end-end  service  may  be  implemented  two  ways. 
The  Endpoint  approach  consists  of  implementing  suitable  control 
algorithms  at  each  end  of  the  communication  path,  while 
employing  various  service  levels  on  each  hop.  (For  example  an 
end-end  virtual  call  service  can  be  provided  with  a SPAR  (cf 
section  1 1-4)  type  protocol  at  each  end  while  using  datagram 
services  in  each  local  net.)  The  Hop-bu-Hop  approach  provides 
the  desired  end-end  service  without  additional  end-end  protocol 
by  requiring  the  desired  service  level  on  each  hop  (each  local 
net),  and  joining  the  hops  together  with  any  necessary 
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translation  performed  In  the  Gateuays.  (For  example,  the  same 
end-end  virtual  call  service  may  be  implemented  by  joining  the 
SPAR  type  protocols  available  on  each  local  net  to  each  other  at 
intermediate  Gateuays.)  The  main  trade-off  here  is  end-end 
controls  vs.  hop-by-hop  controls.  As  ue  shall  see  belou, 
Hop-by-Hop  sacrifices  some  flexibility  but  partially  avoids  the 
need  for  a common  internet  protocol. 

Treating  each  of  these  three  concepts,  a coherent 
analysis  of  netuork  interconnection  trade-offs  becomes  possible. 
Table  1 summarizes  the  classification  schemes  used  by  other 
authors  in  terms  of  these  three  issues  so  their  results  can  be 
more  easily  compared. 

3.1  Local  Net  Interface  Level 

Interconnecting  netuorks  through  a packet  sultch  level 
interface  (PSLI)  may  appear  to  be  the  simplest  strategy  since 
the  resulting  "Catenet"  [Pouzin73J  appears  to  be  one  large 
netuork  uithout  any  complicating  hierarchical  structure.  In 
fact,  PSLI  is  fraught  uith  difficulties  and  has  not  yet  been 
demonstrated  to  be  feasible. 

PSLI  requires  local  nets  to  route  internet  traffic  to 
appropriate  local  net  Gateuay/packet  suitches.  Because  each 
Gateuay  looks  like  a packet  suitch  of  the  local  net,  there  is  no 
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Table  1 

Network  Interconnection  Alternatives  of  Other 
Authors  in  Terms  of  Our  Classification  Scheme 


Local  Net  Local  Not  Imj.  lemontation 
Interface  Service  Approach 

Level  Level 


Pouzii.  [PouzinVll 

"Catenet"  PS 

Host  Level  11 

"Super-network"  H 


Davies  [Davies73] 


VC 

-> 


1111 

7 


"packet" 

"host" 


PS  — E 

H VC  HH 


UCL  (Lloyd75al 

"Switching  node"  PS 

"Parallel  Host"  11  D 

'Series  Host"  H VC 


E 

HH 


BBN  [Binder75] 

"oackot"  PS 

"host"  H 


Cerf  (Cerf74bl 

"host"  H 


D 


D 


E 


E 


H=Host,  PS=Packet  switch,  D=Datagram, 
VC=Virtual  call,  E=Endpoint , HH=Hop-by-Hop , 
— =does  not  apply,  ?=unknown 
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way  for  a source  Host  or  Gateuay  to  specify  the  address  of  the 
next  Gateuay  as  a destination.  Furthermore,  local  packet 
formats  must  be  expanded  to  aiiou  for  internet  addressing 
capability.  Both  of  these  requirements  represent  a serious 
breach  of  local  net  independence  as  discussed  in  section  2 
above. 

The  main  difficulty  of  PSLI  stems  from  widely  varying 
PS-PS  protocols  in  different  nets.  A Gateway  must  somehow  make 
the  rest  of  the  Catenet  look  like  an  extension  of  the  local  net 
it  serves.  The  problem  of  mapping  adjacent  net  PS-PS  protocols 
can  be  considered  in  two  parts  since  a PS-PS  protocol  involves 
some  functions  everu  node  performs,  and  other  functions  only 
performed  by  source  and  destination  nodes. 

Universal  node  functions  include  error  detection, 
retransmission,  duplicate  removal,  and  routing.  It  is 
reasonable  to  assume  that  a Gateuay/packet  switch  can 
appropriately  check  and  generate  checksums  for  error  detection 
and  retransmission,  and  sequence  numbers  for  duplicate  removal. 
However,  some  iiets  may  not  perform  node-node  duplicate 
filtering,  while  others  may  rely  on  1 1 to  avoid  delivering 
duplicates  to  end  users.  In  nets  that  exchange  routing  data 
with  adjacent  nodes  or  a central  authority,  the  Gateway/packet 
switch  will  have  to  generate  appropriate  routing  data  for  the 
rest  of  the  Catenet,  or  actually  translate  routing  data  between 
local  nets  (highly  improbable  given  the  variety  of  routing 
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practices  in  use).  Horse  problems  arise  with  specialized  node 
functions  such  as  packet  tracing,  remote  debugging,  statistics 
gathering,  call  charging,  etc.  Gateuay/packet  switches  will 
have  to  isolate  requests  for  and  responses  from  these  special 
services  to  individual  nets. 

Even  if  difficulties  with  universal  node  functions  could 
be  overcome,  many  local  nets  provide  extensive  additional 
functions  at  source  and  destination  packet  switch  nodes.  EPSS 
requires  a complicated  call  set-up,  buffer  allocation,  flow 
control,  and  end-end  acknowledgement  scheme  to  be  enforced  by 
end  nodes.  A call  between  EPSS  and  another  net  would  require 
all  the  appropriate  fields  and  responses  to  be  generated  at  the 
final  destination,  or  at  the  EPSS  Gateway  (in  which  case  the 
Gateway  is  behaving  like  an  endpoint  of  the  communication  path, 
or  a Host,  not  a simple  packet  switch).  Similarly,  ARPANET 
nodes  perform  storage  reservation,  RFNM  generation,  message 
reassembly,  and  message  sequencing,  all  of  which  would  have  to 
be  performed  compatibly  in  the  remote  net.  Many  nets  also 
perform  accounting  functions  in  end  nodes  so  fees  can  be 
collected.  Complying  wi th  any  one  local  network’s  version  of 
these  additional  functions  is  a formidable  problem,  while 
attempting  to  provide  ail  of  them  in  every  net  i?  clearly 
absurd. 

Pouzin  (1973,  1974a,  1974b)  and  Davies  (1973)  have  been 
the  main  proponents  of  PSLI,  citing  the  following  advantages: 
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(1)  Local  nets  ui  1 1 provide  a simple  packet  transmission 
service  uithout  additional  end  node  functions,  making 
translation  easier. 

(2)  The  translation  required  in  a Host  level  Gateuay  using 
Hop-by-Hop  approach  may  be  more  difficult  than  a PS-PS 
protocol  mapping  (see  section  3.3  belou). 

(3)  The  "super-netuork"  alternative  requires  global 
agreements  that  uill  be  difficult  and  time  consuming  to 
reach. 

Point  1 appears  to  be  the  ueakest  since  many  current 
netuorks  employ  quite  complex  end-end  functions  (the  neu  ARPANET 
type  3 messsages  [Ualden74]  do  provide  a simpler  transmission 
service).  Pouzin  deals  uith  this  by  suggesting  that  extra 
services  can  be  "masked  out"  at  the  Gateuay/packr ‘c  switch  so 
only  basic  packet  forwarding  is  maintained  across  the  Catenet. 
However,  this  is  precisely  the  case  where  a Gateway  performs  end 
node  functions  to  cauterize  local  net  idiosyncracies  and  hence 
is  acting  like  a Host. 

Point  2 correctly  notes  some  of  the  difficulties  of  a 
Hop-by-Hop  approach,  but  these  may  be  reduced  in  an  Endpoint 
i mp I ementai i on  (see  section  3.3).  Point  3 applies  primarily  to 
providing  more  powerful  end-end  services  such  as  virtual  calls, 
in  which  case  common  protocols  are  also  required  with  PSLl. 
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Interfacing  a Gateuay  to  a local  net  as  a Host,  on  the 
other  hand,  has  the  following  advantages: 

(1)  Tfie  Host-PS  interface  is  typical  ly  simpler  than  the 
PS-P3  interface  in  local  nets. 

(2)  l.ocal  net  independence  is  maintained  because  local  nets 
do  not  need  to  know  other  net  protocols  as  required  with 
PSLI.  Each  local  net  protocol  "stops”  at  the  Gateway. 
All  internet  functions  are  implemented  in  Hosts  and 
Gateways  on  top  of  the  local  net  transmission  services. 

(3)  Local  nets  have  greater  control  over  traffic  entering 
from  other  nets  since  Internet  traffic  enters  from  a 
Host. 

(4)  Host-PS  protocol  implementations  typically  exist  for  a 
wide  range  of  machines,  providing  a headstart  and  a wide 
choice  for  Gateway  implementation,  while  packet  switch 
implementations  typically  exist  only  for  a single 
special  purpose  machine. 

Ue  conclude  that  fron  both  practical  and  theoretical 
viewpoints,  heterogeneous  network  interconnection  using  a Host 
level  interface  is  preferable  to  a packet  switch  level 
interface.  The  following  sections  explore  other  network 
interconnection  questions  assuming  that  Gateways  use  a Host 
level  interface  to  local  nets. 
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3. 2 Local  Net  Servicf  Level 


Given  that  Gateuays  are  interfaced  to  local  nets  a& 
Hosts,  several  levels  of  service  are  typically  available  for 
transmitting  internet  packets  through  each  local  net,  Including 
datagram,  virtual  call,  bulk  data  transfer,  and  Interactive 
terminal  services.  These  services  may  be  provided  by  the  PSN 
itself,  or  by  additional  protocols  implemented  in  the  Host 
computers.  More  powerful  services  require  more  complex 
protocols  In  the  Host  (or  PSN),  and  more  control  fields  in  each 
packet  transmitted  (although  reduced  addressing  may  be  available 
on  fixed  route  connections).  A brief  description  of  each  major 
service  level  follous  Mith  references  for  further  details. 


Datagram;  [Ualden74,  Cashin74,  Canada75,  MacPherson75]  Fixed 
maximum  length  messages  are  transmitted  fairly  reliably  to  the 
specified  destination.  Some  messages  may  be  lost  or  duplicated. 
Messages  may  also  be  delivered  out  of  order.  Delivery 
confirmation  (ACK)  may  be  available.  In  general  no  error 
messages  are  returned  for  lost  messages  or  Inaccessible 
dest inat ions. 


Virtual  Cal  I;  ICarr70,  Zimmerman75,  Cerf74c,  Cashln74,  Canada75, 
MacPherson75,  PouzinTSl  This  corresponds  to  the  service  level 
provided  by  Communication  Control  Protocols  discussed  in  chapter 
II.  Arbitrarily  long  (or  at  least  much  longer  than  datagram) 
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tetters  are  reliably  transmitted  to  the  specified  destination. 
Letters  are  never  lost,  damaged,  duplicated,  or  delivered  out  of 
sequence  (as  long  as  the  protocol  is  functioning  correctly). 
Flow  control  may  also  be  provided.  Error  messages  are  returned 
for  inaccessible  destinations. 

Letter*:  may  be  relatively  long  such  as  pages  of  a file 
and  require  fragmentation  by  the  source  CCP  into  smaller 
segments  for  transmission  through  the  local  net,  or  may  be  quite 
short  as  with  interactive  use.  The  important  features  of 
virtual  cal  Is  are  the  guaranteed  reliability,  sequencing,  and 
f I ow  contro I . 


Bulk  Data  Transfer:  [Crocker72,  EPSS75,  Lloyd75a]  This  service 
is  specially  designed  to  facilitate  exchange  of  large  amounts  of 
data  (files)  between  end  users.  The  virtual  call  service 
probably  provides  the  basic  transmission  facility,  while  special 
file  access  modules  are  added  at  each  end  to  provide  for 
convenient  manipulation  of  files,  to  chop  files  into  records 
(letters)  for  transmission,  and  to  reassemble  records  at  the 
destination.  Overhead  is  minimized  by  using  long  records 

(although  limits  must  be  imposed  to  avoid  monopolization  of  not 
services). 


Interactive  Terminal  Service!  (Crocker72,  EPSS75a,  Tymes71] 
large  proportion  of  computer  network  traffic  has  been  and  w 
likeJy  continue  to  bo  terminal  access  to  remote 


A 


service  systems. 


Interconnection  Level 


188 


; ' Virtual  call  service  again  provides  the  basis  for  terminal 

service,  but  because  typical  transmissions  are  short,  it  may 
prove  advantageous  to  multiplex  several  sets  of  terminal  traffic 

. 

into  a single  physical  transmission.  Some  code  conversion  and 
r echoing  control  or  other  features  may  be  added  to  the  basic 

i virtual  call  service  [ARPANET  Telnet].  \ 

Other  special  services  such  as  graphics,  remote  job 
entry,  or  work  load  sharing  may  well  prove  useful  and  become 
more  prevalent  In  the  future.  The  concept  of  service  levels  may 
be  . ictured  as  a tree,  with  datagram  (addressing  and  error 
detection)  at  the  root,  growing  up  through  retransmission, 
dup I i cate  detection,  sequencing,  and  flow  control  into  the 
virtual  call  level,  and  then  branching  Into  various  special 
purpose  services. 


3.3  Endpoint  vs.  Hop-bu-Hop  Protocol  Implementation 

Section  3.2  has  described  several  service  levels 
typically  available  for  interprocess  communication  on  local 
networks.  To  provide  convenient  communication  between  processes 
on  Hosts  in  different  networks,  a similar  spectrum  of  end-end 
service  levels  Is  desirable.  Since  datagram  service  is  likely 
to  be  even  less  reliable  across  multiple  networks,  and  since 
special  purpose  services  -re  likely  to  be  built  upon  virtual 
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call  service,  the  fo I lowing  discussion  focuses  on  al*ernatives 
for  providing  end-end  virtual  call  level  service. 

The  basic  choice  for  implementing  various  end-end 
services  is  between  Endpoint  and  Hop-by-Hop  or  stepwise 
[Pouzin75]  ^preaches: 

the  Endpoint  approach-build  the  necessary  control 
mechafMsms  to  provide  the  desired  service  level  at  each  end, 
requiring  a minimum  of  service  from  the  local  nets  in 
between,  or 

The  Hop-by-Hop  approach— use  existing  protocols  on  each  hop 
(local  net)  to  provide  the  service  level  desired,  and 
connect  the  hops.  Success  depends  on  the  transitivity  of 
service:  if  hop  A and  hop  B provide  the  service,  then  their 
connection  hop  AB  does  also.  Achieving  this  transitivity 
may  require  a nontrivial  translation  between  protocols. 

To  make  these  alternatives  more  concrete,  we  consider 
the  details  of  providing  an  end-end  virtual  call  service  using 
both  approaches.  Such  a Hop-by-Hop  implementation  currently 
exists  between  ALOHANET,  ARPANET,  and  UCL,  while  the  following 
Endpoint  Implementation  example  connecting  ARPANET,  CYCLADES, 
and  PRNET  is  still  under  development. 
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Endpoint  Implementation 

In  the  Endpoint  ^preach,  all  Hosts  must  implement  a 
standard  internet  CCP  (or  use  local  net  protocols  to  access  an 
internet  service  facility  as  described  below).  The  internet  CCP 
may  be  implemented  in  addition  to  an  existing  local  net  CCP  (see 
figure  9).  The  internet  CCPs  produce  internet  packets  which  are 
embedded  in  local  packets  for  transmission  through  a local  net 
to  a Gateway.  A choice  of  local  net  service  levels  is  possible 
for  this  purpose.  In  the  ARPANET,  Hosts  and  Gateways  may 
communicate  using  the  Host-Host  protocol,  a virtual  call  level 
connection.  Alternatively,  Hosts  and  Gateways  may  converse 
using  the  Host-PS  protocol  directly  (in  parallel  with  the  NCP). 
Using  "regular  messages"  thi s const i tutes  a iieak  virtual  call 
type  facility  since  the  subnet  performs  sequencing  and  error 
correction.  Using  "type  3 messages"  lUalden74]  provides  a 
datagram  service. 

Uhich  local  net  service  level  provides  the  most 
effective  total  system  when  combined  with  the  Endpoint  CCP 
control  mechanisms?  Sequencing  is  undesirable  in  local  nets 
since  it  will  be  performed  at  the  destination,  and  increases 
delay  In  each  hop  (cf  section  II-G)  as  well  as  requiring  a 
single  exit  Gateway  from  the  local  net  for  all  packets  of  a 
connection.  Hop-by-Hop  error  correction  is  more  efficient  than 
Endpoint  Ifletcal  fe73J , but  for  low  local  net  error  rates,  the 
difference  is  small  while  the  saving  in  protocol  complexity  from 
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eliminating  retransmission  on  each  hop  may  be  substantial.  If 
Hop-by“Hop  retransmission  is  employed,  Gateways  must  store 
packets  until  acknowledged  by  the  next  Gateway,  requiring  more 
buffer  space  thar  simply  forwarding  packets.  Finally,  some  form 
of  flow  control  between  Gateways  may  be  necessary,  although 
network  capacity  may  adequately  limit  traffic  level-)  In  some 
local  nets. 

The  ARPANET  Is  an  extremely  low  loss  rate  net,  so  there 
is  little  to  be  gained  by  local  net  retransmission  (a  service 
not  offered  by  the  Host-Host  protocol  in  any  caje).  Kleinrock, 
Naylor,  and  Opderbeck  (1974)  have  also  shown  that  overhead  can 
be  significantly  reduced  in  the  ARPANET  by  using  a suitable 
Endpoint  protocol  with  datagram  network  service  rather  than  the 
network’s  virtual  call  (Host-Host)  protocol.  Hence  the  best 
choice  in  ARPANET  is  to  use  the  datagram  local  net  service, 
relying  on  the  end-end  internet  CCP  to  provi de  addi  1 1 onal 
services.  The  situation  is  similar  in  CYCLADES  where  both 
datagram  and  virtual  call  services  are  available.  In  PRNET 
where  internal  packet  loss  rates  are  expected  to  be  high,  local 
retransmission  appears  desirable. 

Having  selected  a local  net  service  level  to  carry 
internet  packets,  the  internet  packets  next  arrive  at  a Gateway. 
The  Gateway  receives  internet  packets  via  the  relatively  simple 
local  net  datagram  protocol  in  most  cases,  and  extracts  the 
Internet  packets  (see  figure  9).  The  Gateway  Is  free  to  send 
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different  packets  from  a single  connection  over  different 
internet  routes  since  the  Endpoint  CCP  ui  I I sequence  them. 
Packets  arriving  at  a Gateuay  need  not  be  sequenced  before 
foruarding.  If  a local  route  fails  during  the  course  of  a 
connection,  alternate  routes  may  be  used  without  causing  end-end 
errors. 

Let  us  call  the  portion  of  a Gateway  associated  with 
each  local  net  a Gateway  "half  (see  figure  10).  A Gateway  half 
contains  a local  net  interface  (implements  the  necessary  local 
net  protocols)  and  other  modules  to  perform  internet  routing, 
fragmentation,  and  any  Gateway-Gateway  functions  (such  as 
retransmission  or  flow  control).  The  local  net  interface  side 
of  each  Gateway  half  is  unique  to  each  local  net,  while  internet 
packets  and  routing  functions  at  the  other  side  of  each  half  are 
global.  Hence  a Gateway  need  not  be  a single  physical  device, 
but  may  consist  of  separate  Gateway  halves  for  each  local  net, 
with  their  internet  sides  tied  together  by  a simple 
communication  line  (see  figure  9).  Each  local  net  could 
implement  I ts  Gateway  ha  I f on  a fully  owned  and  controlled 
machine,  or  even  as  additional  code  on  an  existing  Host,  with 
different  nets'  Gateway  halves  connected  by  an  arbitrary  line 
control  procedure  for  exchange  of  Internet  packets  and  routing 
data.  Broadcast  satellite  links  are  particularly  attractive  for 
this  purpose  since  they  offer  full  connectivity  and  high 
bandwidth. 


nCUKElO  AGAIEWAY**HAIF* 


OTHER  GATEWAY  HALVES 


INTERNET  ROUTING 
(pick  GatMMy) 
FRAGMENTATION 


GATEWAY -GATEWAY  FUNCTIONS 


LOCAL  NET  PROTOCOL 
LOCAL  ROUTING 


LOCAL  NET  PACKET  SWITCHES 
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Each  Gateuay  half  contams  a routing  table  (RT)  to 
perforin  internet  and  local  routing  as  described  in  section  2 
above.  Nou  houever,  each  Gateuay  half  only  maintains  routes  to 
its  local  net,  other  Gateurys  on  its  local  net,  and  other 
Gateuay  halves  (nets)  to  uhich  It  is  connected.  Routes  to 
different  Gateways  in  an  adjacent  net  may  be  merged  into  a 
single  route  to  the  adjacent  net  because  a Gateway  half  need  not 
know  the  internal  structure  of  other  nets,  i-or  example,  figure 
11  shows  the  routing  tables  for  the  local  net  A half  of  Gateway 
G1  in  the  supernet  of  figure  4.  G3  is  another  Gateway  in  local 
net  A,  and  hence  still  appears  as  a route  in  the  internet 
routing  table.  Gateway  G2  is  in  adjacent  net  B,  and  hence  is 
included  in  the  route  to  net  B rather  than  having  a separate 
entry  in  the  routing  table.  Internet  routing  now  occurs  In  two 
steps:  ar;  internet  packet  arriving  from  the  local  net  is  first 
passed  to  the  best  adjacent  net  (Gateway  half),  and  then  the 
best  Gateway  in  that  net  is  selected  by  the  receiving  Gateway 
half. 

Another  advantage  of  implementing  Gateways  as 
independsnt  halves  concerns  software  development  costs  for 
internet  Hosts.  Noting  that  internet  Hosts  must  perform  routing 
functions  identical  to  those  In  Gateways,  Binder  (1975)  has 
suggested  that  a "logical"  Gateway  exists  between  each  Host 
engaged  in  internetworking  and  the  local  not.  Internet  Hosts 
must  cooperate  in  other  Gateway-Gateway  functions  such  as  flow 
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control  and  retransmission  uhen  used,  just  as  Gateuays.  Hence 
Gateuay  halves  and  Internet  Hosts  on  a local  net  can  use  the 
same  modules  to  perform  these  common  functions.  In  the  Gateuay, 
internet  packets  arrive  from  other  local  nets,  while  In  internet 
Hosts  they  are  generated  by  an  internet  CCP  uith  its  additional 
end-end  protocol  functions  (see  figure  12). 

Hop-bu-Hoo  Implementation 

Next  ue  consider  the  Hop-by-Hop  approach  to  implementing 
an  end-end  virtual  call  service  over  several  networks.  In  this 
case,  no  intorr.et  CCP  is  required  at  source  and  destination 
Hosts.  Instead  each  local  net  must  provide  a virtual  cal  I level 
service,  uith  Gateways  translating  between  each  local  net 
service  (see  figure  13).  To  facilitate  this  translation,  a 
number  of  universal  virtual  call  protocol  functions  can  be 
ident i f i ed: 

(a)  Set  up  call:  the  willingness  of  both  parties  to 
communicate  is  established,  and  various  parameters  such  as 
letter  length,  window  size,  buffer  allocation,  byte  eize, 
echo  mode,  and  abbreviated  addressing  are  agreed  on. 

(b)  Terminate  call:  the  connection  is  broken  either 
immediately  or  after  any  letters  in  progress  have  been 
de I i vered. 

(c)  Send  a letter. 


(d)  Receive  a letter. 


13  HOP-BY-HOP 
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(e)  Signal  an  Interrupt,  reset,  or  attention. 

(f)  Obtain  status  Information  on  letters  pending  or  call 
parameter  values. 

Consideration  of  this  list  indicates  that  translation 
maybe  difficult.  In  some  cases  sinilar  but  incompatible 
services  are  provided  by  different  local  net  protocols,  such  as 
letter  based,  byte  based,  or  line  based  flou  control.  The 
difficulties  of  interfacing  di  fferent  flou  control  mechan  I sms 
are  frequently  not  appreciated  IUKP074,  Stokos75).  Other 

services  such  as  status,  echo  control,  or  interrupt  nay  not  be 
provided  at  all  by  some  local  nets.  In  this  case  the  Gateuay 
must  simulate  compliance  local  ly  uithout  being  able  to  obtain 
the  service  at  the  ultimate  destination.  In  general  thi, 
reduces  internet  services  to  the  subset  of  services  offered  by 
all  local  nets,  or  requires  the  end  user  to  be  auare  of  uhat 

services  he  Is  >-eally>  getting  depending  on  the  particular 
local  nets  traversed. 

An  alternative  to  -masking*  services  without 
counterparts  In  subsequent  local  nets  as  above.  Is  to  add  the 
missing  services  uith  extra  modules  outside  the  local  net 
protocol.  This  nay  be  suitable  betueen  Gateuays  of  a local  net, 
but  internet  Hosts  must  also  add  the  modules  to  existing 
protocols,  resulting  In  a modified  interface  for  internet 
communication  after  all.  Although  the  addition  ul  1 1 be  smal  ler 
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than  a full  Endpoint  CCP,  there  Is  a quantum  jump  In 
inconvenience  uhen  any  change  to  the  user  interface  is 
necessary. 

To  provide  a concrete  example  of  Hop-by-Hop 
implementation,  consider  Interconnecting  ALOHANET,  ARPANET,  and 
UCL  where  this  approach  has  been  used  [Blndtr74,  Hlgglnson75] , 
A major  difference  of  these  Implementations  from  the  Endpoint 
approach  is  that  a user  is  required  to  explicitly  set  up  the 
call  on  each  hop.  For  example,  an  ALOHANET  user  may  first 
connect  to  his  local  Gateway  using  ALOHANET  protocol,  then 
connect  to  UCL  using  ARPANET  Telnet  protocol,  and  then  command 
the  UCL  Gateway  to  connect  him  to  the  IBM3B0.  Once  the 
connection  is  established,  data  is  forwarded  automat  leal  I y by 
each  Gateway. 

Both  ALOHANET  and  UCL  Gateways  have  developed  a similar 
set  of  commands/ functions  to  those  listed  above,  to  facilitate 
interconnection  of  the  several  nets  or  computers  connected  to 
each  Gateway.  Some  special  functions  (e.g.  Interrupt)  are 
automatically  translated  by  both  Gateways.  Other  functions  are 
only  implemented  in  some  nets  (e.g.  echo  control  in  ARPANET)  or 
some  Gateways  (e.g.  Status  and  Operator  functions  at  UCL)  and 
cannot  be  translated  or  automatically  forwarded.  These 
functions  must  be  explicitly  requested  by  using  "escape" 
characters  Interpreted  by  one  of  the  Gateways  (rather  than 
passed  on  as  data).  As  noted  above,  the  end  user  also  sets  up 


each  hop  of  the  connection  by  using  command  packets  to 


communicate  uith  each  Gateway  in  turn. 

It  may  be  possible  to  automate  connection  establishment 
by  having  Gateways  forward  a standard  call  set-up  packet,  but  It 
will  sti  1 1 be  necessary  for  a user  to  specify  an  entire  path  or 
at  least  the  final  internet  destination  in  some  nonstandard  (to 
the  local  protocol)  fashion.  Unfortunately,  many  special  Telnet 
services  have  no  direct  counterpart  in  ALOHANET  or  IJCL  nets,  so 
automatic  translation  of  such  services  Is  problematic.  Similar 
difficulties  must  be  expected  with  the  implementation  of  other 
special  purpose  protocols  such  as  bulk  data  transfer  where  local 
net  differences  are  likely  to  be  greater  (for  example  see 
[Stokes75I ). 

Another  difference  In  the  Hop-by-Hop  approach  (for 
virtual  call  service)  is  that  a single  internet  path  (of 
Gateways)  must  be  used  between  source  and  destination.  Uhen  one 
hop  fails  or  malfunctions,  end-end  service  I s affected  si  nee 
there  are  no  corrective  end-end  control  mechanisms. 

As  a final  difference,  each  new  Gateway  or  network  added 
with  the  Hop-by-Hop  approach  presents  a unique  translation 
problem  between  the  protocols  involved.  Acceptance  of  virtual 
call  protocol  standards  may  simplify  this  problem  somewhat.  On 
the  plus  side,  only  bilateral  agreement  between  connecting 
networks  is  required  for  translation  of  local  protocols. 
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Ue  can  now  make  a number  of  comparisons  between  Endpoint 
and  Hop-by-Hop  Interconnection  strategies  described  above.  UCL 
has  been  the  primary  advocate  of  a Hop-by-Hop  approach,  citing 
the  following  advantages  [Lloyd75a.  Lloyd75b,  Hlgglnson75] : 

(1)  Only  bl  lateral  agreement  Is  required,  allowing  immediate 
development  and  implementation,  while  Endpoint  requires 
difficult  and  time-consuming  multilateral  agreements. 

(2)  Existing  local  not  protocols  are  employed  to  full  advantage 
and  no  now  internet  CCP  is  required,  reducing  software 
development  and  user  accomodation  to  a minimum. 


Point  1 is  well  taken  and  argues  strongly  in  favor  of 
Hop-by-Hop  as  a quickly  available  interim  implementation.  Point 
2 requires  closer  scrutiny  considering  both  internet  Hosts  and 
Gateways.  For  Gateways,  point  2 seems  to  be  incorrect.  As 
shown  above  (for  end-end  virtual  call  service),  the  Hop-by-Hop 
Gateway  requires  a local  not  virtual  call  protocol,  plus  a 
unique  (to  each  network  pair)  translation  between  local  not 
protocols,  or  mapping  into  some  set  of  universal  call  functions. 
The  Endpoint  Gateway  requires  smaller,  simpler,  local  not 
datagram  protocol  (with  possible  additions  of  some  locally 
desirable  extra  functions),  and  no  translation.  Endpoint 
Gateway  "halves"  for  a given  local  not  are  all  identical, 
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avoiding  the  unique  translation  effort  required  in  the 
Hop-by-Hop  approach.  Both  types  of  Gateuay  require  internet 
routing,  fragmentation,  accounting,  and  other  auxiliary 
functions  discussed  in  the  next  section.  The  Hndpoint  Gateuay 
does  not  contain  "a  complete  internet  Host  for  each  attached 
netuork"  as  stated  in  [Lloyd751,  but  only  the  internet  routing 
and  related  functions  common  to  Gateuays  and  internet  Hosts.  In 
particular,  there  is  no  end-end  Internet  CCP  In  the  Endpoint 
Gateuay  (unless  the  Gateuay  is  also  an  internet  Host).  In 
summary.  Endpoint  Gateuays  appaar  to  require  less  total 
softuare,  and  less  neu  softuare  development  than  Hop-by-Hop 
Gateuays. 

Uithin  Hosts,  point  2 appears  more  justified  since  the 
Hop-by-Hop  approach  does  maintain  existing  local  net  protocols. 
Houever,  significant  user  intervention  has  been  necessary  to 
complement  local  protocols  for  purposes  of  internetuorking  as 
discussed  above. 

Several  considerations  reduce  the  difficulty  of  Endpoint 
intorn*,i  tost  implementations  requiring  an  internet  CCP.  In  neu 
netuorks,  internet  standards  can  be  implemented  from  the  start. 
In  existing  netuorks,  the  development  of  high  level  language 
implementations  of  standard  Gateuay  functions  may  reduce  local 
net  development  efforts  to  local  net  interface  portions  (uhich 
already  exist  for  local  Hosts).  The  internet  routing  and 
fragmentation  portion  of  Gateuay  functions  are  identical  in  all 
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Gateuays  and  internet  Hosts,  and  may  be  transportable  from  other 
implementations. 

From  the  user’s  point  of  view,  an  Internet  CCP  may 
provide  an  Interface  that  is  similar  to  existing  local  net 
protocols  so  changeover  should  be  facilitated.  Uhere  adoption 
of  internet  CCP  at  a particular  Host  site  is  not  possible 
(inadequate  facilities,  fixed  system,  effort  not  Justified  by 
expected  use),  connection  through  existing  local  protocols  to  an 
internet  service  site  may  be  used  (see  figure  14).  The  Internet 
server  might  provide  all  CCP  and  Gateway  functions,  with  the 
local  net  serving  essentially  as  an  access  line  between 
processes  and  (remote)  CCP.  The  local  net  access  line  mry 
degrade  performance  of  the  CCP  which  is  no  longer  fully  end-end. 
This  strategy  shares  some  of  the  disadvantages  of  a Hop-by-Hop 
approach,  but  allows  Hosts  with  limited  facilities  (intelligent 
terminals,  packet  radio  units)  to  make  use  of  internetwork 
faci I i t i es. 

As  a more  robust  compromise,  individual  Hosts  may 
provide  an  internet  CCP  but  not  the  additional  Gateway  functions 
(primarily  internet  routing).  In  this  case  the  Hosts  transmit 
locaily  generated  internet  packets  to  a simpler  internet  service 
facility  which  provides  Gateway  functions  (a  kind  of  central 
office  for  routing  and  supernet  control,  or  perhaps  a normal 
Gateway).  Some  sacrifice  in  routing  optimality  and  robustness 
compared  to  a complete  internet  Host  may  result. 
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Another  technique  for  reducing  the  burden  of 
internetwork  communication  facilities  on  Host  resources  is  to 
pi  ace  the  CCP  in  a front-end  processor.  This  moves  most 

communication  functions  to  the  front  end,  leaving  the  Host 
primarily  with  a distribution  function  (of  packets  to 

processes).  A Host-front  end  protocol  must  also  be  defined  and 
supported  in  the  Host  [PadI  ipsky74] , but  this  is  typically  much 
simpler  than  a complete  CCP  since  Host  and  front  end  are  closely 
coupled.  The  internet  service  site  described  above  may  be 
considered  a type  of  front  end  that  is  less  closely  coupled  to  a 
Host.  Front  ending  may  also  provide  advantages  in  implementing 
local  net  protocols  [Neuport72,  Feinroth73,  Benoit74],  but 
further  discussion  of  front  ending  is  beyond  the  scope  of  this 
work. 

Cerf  si  al.  have  been  the  main  proponents  of  Endpoint 
interconnection  strategies  (Cerf73,  Cerf74a,  Cerf74b,  Cerf74c] . 
Davies  (1973)  has  also  argued  the  advantages  of  an  Endpoint 
approach  although  he  discussed  them  in  conjunction  with  a packet 
switch  level  Gateway.  Advantages  of  the  Endpoint  implementation 
(for  an  end-end  virtual  call  service)  include; 

(1)  Greater  flexibility  and  reliability  since  alternate  Internet 
paths  may  be  used  and  path  failures  are  recovered  by  end-end 
controls. 

(2)  Smaller,  simpler  Gateways. 


Interconnection  Level 


208 


I (3)  A single  Gateuay  function  module  may  be  used  at  ail  Gateways 

f . and  internet  Hosts  on  a local  net. 

jP  (4)  Addition  of  new  nets  or  supernet  topology  changes  are  easy 

because  the  interface  between  Gateuay  "halves"  is  universal. 
) The  internet  side  of  Gateuay  halves  for  any  local  nets  can 

i_ 

' be  connected  without  special  modifications. 

^ (5)  Fragmentation  is  simpler  and  more  flexible  (see  next 

^ section). 

I 

( 

, (6)  Acknowledgements  and  control  functions  are  truly  end-end. 

‘ (7)  The  Internet  CCP  provides  a uniform  user  Interface  that 

"really"  provides  ^ecified  virtual  call  services, 
f.  Hop-by-Hop  must  mask  (fake)  some  services,  augment  existing 

^ user  Intarfaces,  or  require  expMcIt  user  Intervention  at 

, individual  hops. 

^ ' (8)  The  same  Gateuay  Implementation  and  local  net  services  can 

i 

i be  used  to  provide  different  end-end  internet  services  by 

I 

; providing  different  protocols  at  the  ends  in  the  Internet 

t CCP.  For  example  bulk  data  transfer  or  a different  virtual 

V 

^ . cajl  service  could  be  Implemented  in  Internet  Hosts  with  no 

I change  to  Gateways.  In  the  Hop-by-Hop  approach,  different 

f local  net  protocols  and  a new  translation  between  them  in 

the  Gateway  would  be  required  for  each  end-end  service. 

i 

In  conclusion.  Endpoint  and  Hop-by-Hop  Interconnection 
strategies  appear  beat  suited  for  different  situations.  Both 
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merit  further  experimentation  to  verify  projected  advantages  and 
difficulties.  (Such  experimental  plans  are  outlined  in 
[Binder751,  [Lloyd75al,  and  [Cerf74bl.)  A Hop-by-Hop  approach 
appears  most  suitable  for  backuard  compatabi I i ty  (see 
[Burchf iel751 ),  minimum  effort,  or  immediate  need  applications, 
uhi le  an  Endpoint  approach  offers  greater  robustness  and 
generality  but  requires  substantial  standards  and  neu  software 
development.  ■ Development  costs  should  be  reduced  by  the 
universal  applicability  of  internetworking  modules  in  the 
Endpoint  approach. 


The  central  questions  of  internet  routing,  addressing, 

I ^ 

[ and  Gateuay  implementation  alternatives  have  been  discussed  In 

the  previous  tuo  sections.  This  section  considers  a number  of 
additional  Gateuay  functions  important  to  network 
interconnection. 

i 

ii 

s 

Fraomeritatlon 

A great  deal  of  discussion  has  surrounded  the  issue  of 
fragmentation.  One  uay  to  avoid  the  difficulties  of  differing 
local  not  packet  size  limits  is  to  establish  a standard  minimum 
maximum  packet  size  for  ail  nets,  and  never  use  packets  larger 
than  the  standard  for  Internetwork  communication. 

Such  a standard  appears  i I l~advised  for  several  reasons. 
Since  network  performance  depends  heavily  on  packet  size, 
networks  designed  for  different  purposes  will  have  good  reasons 
for  different  local  packet  sizes,  and  agreement  on  a standard 
length  will  be  difficult.  Too  small  a standard  results  in  high 
overhead  for  internetwork  packets  wi  th  typical ly  long  headers 
(lengths  over  250  bits  have  been  proposed),  while  a long 
standard  will  be  difficult  for  some  nets.  Nets  ui  th  smal  ler 
limits  might  be  able  to  fragment  packets  at  entry  and  reassemble 
them  at  exit  to  comply  with  the  standard.  However,  this 
sacrifices  the  flexibility  of  a I ternate  routing  which  becomes 
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impossible  uhen  all  fragments  must  exit  through  the  same  Gateway 
in  order  to  be  reassembled.  Finally,  rapidly  changing 
technology  Including  high  bandwidth  digital  circuits  and 
broadcast  transmission  will  certainly  dictate  revision  of 
optimal  packet  sizes. 

In  general,  a source  CCP  is  obliged  to  fragment  large 
letters  for  transmission  into  its  local  net.  With  Hop-by-Hop 
network  interconnection,  a Gateway  must  translate  between 
fragmentation  schemes  in  each  locai  net,  possibly  further 
fragmenting  oversize  fragments  or  letters.  Where  such 
translation  is  impossible,  the  Gateway  may  have  to  reassemble 
each  letter  and  refragment  it  compatibly  with  the  next  local  net 
protocol . 

Endpoint  network  interconnection  allows  a simpler  aid 
more  flexible  fragmentation  strategy.  Individual  nets  may  use 
arbitrary  packet  sizes,  while  Gateways  fragment  oversize 
internet  packets  at  entrance  to  a local  net,  and  reacsembly  of 
fragments  occurs  at  the  destination  CCP.  The  internet  header 
includes  fields  to  control  reassembly  of  letters  at  the 
destination  CCP.  It  is  quite  straightforward  to  allow  a Gateway 
to  (further)  fragment  internet  packets  by  using  the  same  fields 
as  the  source  CCP.  The  entire  internet  header  is  copied  for 
each  fragment  created,  with  alterations  only  to  Indicate  the  new 
text  length  and  order  of  each  new  fragment.  Fragments  can  then 
be  forwarded  independently  to  the  destination  where  they  are 
reassembled  exactly  as  if  generated  by  the  source  CCP. 
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Cerf  (1973,  1974c)  has  proposed  a fragmentation  control 
scheme  based  on  byte  sequencing  of  data  on  a connection,  uhile 
McKenzie  (1974)  has  proposed  a similar  scheme  based  on  larger 
units  of  data.  Smaller  units  allow  smaller  fragments  of  a 
letter  to  be  transmitted  and  finer  grained  flow  control,  but 
require  longer  packet  identifier  and  acknowledgement  fields 
[Day75a] . Longer  fields  are  required  to  ensure  reliability 
because  sequence  numbers  must  not  be  reused  for  a maximum  packet 
lifetime  (cf  section  II-4),  and  small  units  consume  sequence 
numbers  at  a faster  rate  for  a given  throughput.  LeMoll  (1975) 
has  suggested  a hierarchical  fragment  notation  which  would  be 
carried  in  a separate  field  of  the  internet  header.  This 
fragmentation  scheme  would  operate  independently  of  other 
aspects  of  the  end-end  protocol  in  contrast  to  the  Cerf  proposal 
where  the  sequence  number  field  serves  both  reliability  (cf 
section  1 1 -4)  and  fragmentation  purposes.  Any  of  these  schemes 
appear  suitable  for  Endpoint  interconnection  strategies  where 
the  Gateways  and  internet  Hosts  all  share  a uniform 
fragmentation  mechanism. 

Although  adoption  of  such  a scheme  allows  arbitrary 
local  net  packet  sizes,  it  complicates  selection  of  optimal 
packet  sizes  for  internet  communication.  Presumably  small 
packets  (from  interactive  traffic)  will  traverse  all  nets  on  a 
one-to-one  basis  with  no  complications.  High  throughput 
applications,  on  the  other  hand,  tend  to  use  large  packet  eizes 
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i 
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to  reduce  overhead.  In  this  case,  passing  through  even  a single 
"stnal  I packet"  net  may  cause  degradation,  since  once  packets  are 
fragmented  in  the  small  packet  net,  they  are  not  reassembled. 
All  the  fragments  must  be  carried  through  subsequent  nets  uhich 
might  have  accepted  the  original  packets  more  efficiently,  or  at 
lowier  cost.  In  such  cases  a user  may  uish  to  forego  the  added 
robustness  of  independent  fragment  propagation  in  favor  of  local 
net  fragmentation/reassembly. 

Cost  minimization  may  prove  a very  slippery  problem  as 
shoun  by  the  follouing  example.  Fees  in  a PSN  may  be  charged 
per  packet  (regardless  of  length),  or  per  bi  t.  Ignoring  other 
factors  such  as  distance  or  service  level,  cost  may  be  measured 
as  packets  per  bit  (of  data)  or  total  bits  per  bit  (of  data). 
Assume  a user  in  net  A uishes  to  send  a large  amount  of  data  to 
another  net  A user.  Suppose  net  A allous  4000  bit  packets'  uhile 
net  B allous  only  1000.  The  internet  packet  header  is  200  bits 
long.  Strategy  1 minimizing  both  cost  measures  in  net  A is  to 
send  maximum  length  packets  uith  3800  bits  of  data. 

Now  suppose  a nst  A user  wants  to  communicate  with  a net 
B user.  Strategy  2 attempts  to  optimize  transmission  of 
fragments  through  not  B by  sending  3200  bits  of  data  in  each  net 
A packet,  allowing  fragmentation  into  four  full  packets  in  net 
B.  Table  2 shows  the  packet  and  bit  costs  resulting  from  the 
two  net  A packet  sizes,  assuming  charges  are  the  same  in  both 
nets.  Strategy  2 reduces  the  packet  cost  as  expected,  but 
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increases  the  bit  cost  (the  overhead  reduction  in  net  B is 
offset  by  the  increase  in  net  A). 

In  general,  neither  the  route  fo I lowed  nor  the  fragment 
sizes  generated  on  a communication  path  are  constant  or  even 
known.  Hence  such  attempts  at  cost  or  perf->rmance  optimization 
become  quite  difficult.  Fortunately,  cost  differences  are  small 
if  a reasonably  large  maximum  packet  size  is  available  in  all 
nets  (a  packet  size  ten  times  the  internet  header  size  has  an 
overhead  of  0.11  for  full  packets). 


Account i ng 

As  indicated  above,  local  net  fees  may  be  based  on 
several  factors  including  number  of  packets,  number  of  bits, 
distance,  connect  time,  and  service  level  (reliability, 
bandwidth,  delay).  Presumably  each  local  net  has  effective 
techniques  for  recording  charges  and  collecting  fees  from  local 
Hosts.  Uith  network  interconnection,  the  Gateways  present  a new 
source  of  traffic  that  must  be  charged.  Packet  switch  level 
Gateway  interfacing  presents  great  difficulties  here  since  local 
nets  do  not  normally  charge  traffic  from  adjacent  packet 
swi tches. 

Local  nets  are  normally  equipped  to  charge  traffic  from 
Hosts,  making  Host  level  Gateway  interfacing  preferable  for 
accounting  purposes.  Nevertheless,  a local  net  is  likely  to 
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charge  on  the  basis  of  total  traffic  from  the  Gateway  Host 
without  distinguishing  the  sources  of  such  traffic.  Hence  it  is 
up  to  the  Gateway  "half"  for  each  local  net  to  monitor  traffic 
levels  from  each  source  of  interest.  For  outgoing  (from  the 
local  net)  traffic,  the  Gateway  may  wish  to  separately  account 
for  traffic  from  each  local  Host  (including  other  Gateways). 
For  incoming  traffic  (from  other  nets),  discrimination  on  the 
basis  of  net  may  be  adequate.  Outgoing  traffic  from  one  net*s 
Gateway  half  becomes  Incoming  traffic  to  the  Gateway  halves  of 
connected  networks,  so  each  local  net  authority  will  be  in  a 
position  to  verify  charges  from  adjacent  nets. 

In  establishing  internetwork  charges,  other  existing 
internetwork  communication  systems  (Post  Office,  telephone, 
telegraph)  provide  well  developed  examples.  Presumably  each 
local  net  authority  will  collect  both  local  and  internet  fees 
from  local  users,  exchanging  accumulated  internet  charges  with 
other  nets  periodically.  If  charges  depend  on  the  internet 
route  (number  or  identity  of  nets  traversed),  users  may  desire 
the  option  to  control  routing,  or  at  least  to  specify  "minimum 
cost"  routing.  This  may  significantly  complicate  the  routing 
algor i thms. 
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! 

Status  Honitorina  and  Reporting 

In  secton  2 on  routing,  the  "reachability"  of  local 
Hosts  Mas  mentioned  as  an  important  part  of  the  routing  data 
base.  Each  Gateway  half  is  responsible  for  knowing  the 
reachability  of  the  Hosts  (CCPs)  on  its  local  net.  This  is 
particularly  desirable  for  virtual  call  protocols  which 
retransmit  to  achieve  rel i able  communication.  A "destination 
inaccessible"  error  message  must  be  returned  to  the  source  CCP 
in  order  to  quench  useless  retransmissions,  and  even  more 
importantly,  as  information  for  the  user.  Useful  subtypes  of 
such  a message  might  specify  the  level  of  failure  (Net,  Gateway, 

Host,  CCP,  Process,  Port),  and  the  reason  (nonexistent,  dead, 
busy)  [Cerf74b] . 

Local  nets  often  maintain  Host  accessibility  data  as 
part  of  their  local  routing  procedures.  In  local  nets  , where 
internet  packets  are  embedded,  the  local  not  may  generate  an 
error  message  and  return  it  to  the  Gateway  which  was  the  local 
net  source  of  the  packet  in  error.  The  local  net  cannot  signal 
the  internet  source  directly  since  the  local  net  Is  not  aware  of 
the  Internet  header  and  Internet  signalling  conventions.  On  the 
basis  of  a local  not  error  message,  the  Gateway  can  mark  a local 
Host  as  inaccessible,  and  return  Internet  error  messages  for 
subsequent  packets  destined  for  the  inaccessible  destinat ion. 
Such  remote  signalling  is  more  difficult  with  Hop-by-Hop  network 
interconnection  where  control  signals  must  be  translated  between 
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local  net  protocols  as  discussed  in  section  3.  The  Gateuay  may 
also  determine  Host  accessibility  on  its  oun  by  periodically 
exchanging  status  packets  with  local  Hosts  or  monitoring 
internet  traffic  to  local  Hosts. 

Other  status  information  such  as  loads,  traffic  levels, 
expected  delays,  charges,  availability  schedules,  current  users, 
current  line  conditions,  etc.  may  be  maintained  by  local  nets. 
Hosts,  or  Gateways,  and  made  available  by  internet  status 
inqujries.  Some  local  net  status  information  is  directly  usable 
at  the  internet  level  (for  example  access!  i I ity  data  as  above), 
while  other  local  net  data  may  not  be  useful  (for  example 
transmission  error  messages  when  no  Hop-by-Hop  retransmission  is 
employed).  Special  services  such  as  tracing,  echoing,  or 
discard  may  also  be  defined  at  Gateways  and  internet  Hosts. 
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I Appendix  A 

[ - CONNECTION  ESTABLISHMPNT  Ponncc 

I 

t 

In  thi.  appendix  Me  ppBsent  a proof  that  the  protocol 
! Initialization  Mechanisms  discusssd  In  chapter  II  correctly 

establish  a connection  for  reliable  communication  betuaen  the 
processes  served  by  the  protocol.  The  proof  extend,  the  methods 
of  Gilbert  and  Chandler  (1972)  to  distributed  systems  ulth  *thln 
Mire*  Interprocess  communication  metcalfo72J.  Gilbert  and 
Chandler  defined  the  ■composite  state*  of  a system  as  the  state 
of  each  process  In  the  system  plus  the  value  of  shared  variables 
In  a common  memory.  Since  communication  protocols  Interact  by 

exchanging  messages  rather  than  shared  variables,  this  model  Is 

not  directly  applicable. 

Me  define  a •<«i  iar  composite  state  of  a Comm.«leStToh  ' 

Control  Protocol  (CCP)  a.  the  state  of  the  protocol  process  on 
each  side  of  the  connection,  plus  any  "relovant"  packets  In  the 
transmission  medium  betueen  them.  The  tuo  protocol  processes 
are  modeled  a.  (Identical)  state  machine.  operating 
Independently  except  for  the  explicit  exchange  of  packets. 
Synchronization  of  the  tuo  prc.sssec  Is  achieved  uhen  one  CCP 
Malta  for  a particular  type  of  packet  from  the  other  CCP. 
Transitions  from  one  composite  state  to  another  are  derived  from 
the  state  transition,  of  the  Individual  protocol  machine. 
(Glibert  and  Chandier’e  "partial  ruies"). 
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Another  tnajor  difficulty  in  applying  state  models  to 
communication  protocols  Is  the  large  number  of  states  that  must 
be  considered  for  a protocol  of  oven  modest  complexity  (cf 
section  1,2  of  chapter  II).  Ue  overcome  this  problem  by 
severely  limiting  the  number  of  packets  which  must  be  considered 
part  of  the  composite  state.  In  a straightforward  approach, 
every  packet  from  the  time  the  composite  system  was  "created" 
would  have  to  be  represented.  By  considering  the  protocol’s  use 
of  sequence  numbers  and  control  packets,  all  packets  in  the 
transmission  medium  can  be  classified  as  either  "current"  or 
"old"  packets.  Since  we  are  primarMy  interested  in  worst  case 
analysis,  we  assume  that  gnU  old  packet  may  be  delivered  to  a 
protocol  machine  at  any  time  (limited  by  maxlm'jm  packet 
lifetime).  Hence  only  current  packets  must  be  explicitly 
represented  as  part  of  the  composite  state, 

Ouf  composite  state  model  helps  answer  several 
interesting  questions  about  the  reliability  of  connection 
establishment,  such  as: 

(1)  Forbidden  states  or  state  sequences:  Does  the  protocol 
ever  reach  undesirable  states  (e.g.  accepting  old  duplicate 
data)? 


(2)  Deadlock:  Does  the  protocol  reach  a state  whore  each 

side  is  waiting  for  the  other  and  neither  can  proceed? 
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(3)  Halting:  Does  the  protocol  eventually  estabiloh  a 
connection  once  It  sets  out  to  do  so? 

(4)  Fa  1 lure  consequences:  When  ono  side  of  the  protocol 
falls  (for  example  due  to  a Host  crash),  does  the  protocol 
deadlock,  or  recover? 

To  proceed  ulth  the  analysis,  ue  first  provide  a 
detailed  description  of  the  protocol  for  connection 
e;r<tabl  ishment  In  the  form  of  a state  machine  with  contoxt.  This 
machjne  Is  based  on  the  mechanisms  described  in  section  5.2  of 
chapter  II.  Next  we  present  the  composite  state  diagram 
resulting  from  this  protocol  machine,  and  demonstrate  its 
reliability.  Finally  we  analyra  one  of  the  simpler  connection 
establishment  procedures  presented  in  sect  Mon  5.2  to  ehow  ite 
weaknesses. 


bA Protocol  Hachlne  hodel 

Ue  describe  the  main  functions  of  the  CCP  process  on 
each  side  of  a connection  with  a state  machine  (see  figure  A-1). 
A state  machine  model  Includes  states  (represented  as  circles  in 
figure  A*l),  transitions  from  one  state  to  another  (represented 
as  directed  arcs),  events  which  cause  the  transitions  (written 
above  horizontal  bars  in  the  figure),  and  actions  associated 
with  each  transition  (written  below  events). 
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FIGURE  A • 1 “3  WAY  HANDSHAKE**  PROTOCOL  MACHINE 
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Four  major  states  are  necessary  to  model  the  3 May 
handshake  connection  establishment  mechanism  (cf  section 
II-5.2); 

Not  Active  (NA);  The  protocol  is  not  Initialized  and  a 
minimum  of  information  about  the  connection  is  maintained 

SYN  Received  (SR);  A SYN  control  packet  has  been  received, 
and  a SYN-Verify  returned  to  the  remote  protocol,  but  the 
final  ACK  (third  part  of  the  3 uay  handshake)  has  not  been 
received 

§YN Sent  (SS):  In  response  to  an  OPEN  command  from  a 

process,  a SYN  control  packet  has  been  sent  to  the  remote 
protocol  machine,  but  no  SYN-Verify  response  (eecond  part  of 
the  3 uay  handshake)  has  been  received 

Established  (ES):  The  3 uay  handshake  is  complete  and  the 
protocol  is  initialized  for  reliable  data  communication 

In  addition  to  the  major  state,  each  CCP  maintains  a 
context  of  additional  information  about  a connection.  Including 
sequence  numbers  and  ponding  packets  as  described  in  section 
II-4,  parameter  values  such  as  flou  control  uindou  size,  quit 
time,  or  retransmission  timeout,  and  internal  timere.  Uee  of 
this  context  Infoimation  In  the  protocol  machine  model  reduces 
the  number  of  states  required  to  represent  protocol  operation 
and  further  simplifies  the  composite  state  analysis. 

Events  activating  the  protocol  machine  include  packets 
arriving  from  the  transmission  medium,  commands  from  the  local 
process,  and  Internal  timeouts.  The  events  relevant  to 
connection  establishment  are: 
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Packet  Arrivals  (without  error) 

SYN;  a SYN  control  packet 

SYN-Ver:  a SYN-Verify  control  packet 

Data:  a Data  packet 

ACK:  an  Acknowledgement  packet 

NACK:  a negative  acknowledgement  (of  a SYN  or  SYN-VerIfy) 
Reject:  a Reject  control  packet 

Process  Commands 
OPEN:  open  a connection 

Internal  Timeouts 

Retrans:  a pending  packet  requires  retransmission 
Quit:  the  Quit  time  for  a pending  packet  has  expired 
Retry:  the  collision  retry  timeout  has  expired 

All  of  these  events  have  been  deecribed  In  sections  4 and  5 of 
chapter  II  except  the  Reject  control  packet  which  has  been  added 
to  allow  a process  to  refuse  attempts  to  establish  a connection. 

The  operation  of  the  protocol  Is  represented  by  the 
transi  t ions  and  their  associated  actions.  The  current  state, 
the  event  occurring,  and  the  context  together  determine  the 
action  to  be  taken  and  the  next  state.  Figure  A-1  shows  the 
transitions  normally  occurring  during  connection  establishment. 
For  completeness,  the  occurrence  of  al  I events  in  each  state 
must  be  considered.  Since  this  would  result  In  an  overly 
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complex  drawing,  tables  A-1  to  A-4  provide  a complete 
description  of  the  transitions  from  each  of  the  four  states. 
Each  transition  is  named  for  later  reference  with  the  two  letter 
code  of  the  current  state  followed  by  an  integer.  Then  the 
event  causing  the  transition  is  given,  fo I lowed  by  the  next 
state,  the  action  taken,  and  any  relevant  context  tests.  Events 
causing  the  came  action  and  next  state  are  grouped  under  a 
single  transition  name. 

The  sst  of  operations  (detect  event,  take  a^^proprlate 
action,  move  to  new  state)  are  assumed  to  be  ctomic  or 
uninterruptable  so  that  no  confusion  can  result  from  nearly 
simultaneous  events.  In  a real  implementation,  this  may  require 
some  sort  of  lock  facility  to  defer  later  events  while  the 
operations  triggered  by  an  earlier  event  are  conpleted. 

The  transitions  are  based  on  the  description  of 
connection  establishment  mechanisms  In  section  1 1 -5 
partio  larly  the  3 uay  handshake.  The  simple  col  1 1 si  on  recovery 
technique  is  used  because  it  allows  a simpler  model. 
Retransmission  and  Quit  times  apply  to  control  packets  (SYN  and 
SYN-Verify)  as  described  for  data  pack^♦s  In  section  1 1 -4. 

Reject  packets  are  returned  in  response  to  a SYN 
(transition  NA2)  when  the  receiver  is  unwilling  to  establish  a 
connection.  Uhen  the  Reject  reaches  the  initiating  (XP,  the  CCP 
can  cease  retransmitting  SYN  and  notify  its  local  process  of  the 
reason  for  rejection  (transition  SS7).  The  transmission  medium 
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Table  A-i 

3-Uay-Handshake  Protocol  State  Transitions 
from  Not  Active  State 


Name 

Event 

Next 

State 

Action  and  Context 

NAl 

SYN 

SR 

Send  SYN-Ver ify  referencing  SYN. 
Remember  seq.  no.  of  SYN. 

NA2 

SYN 

sIf 

Send  Reject  referencing  SYN  if  busy. 

NA3 

SYN-Ver 

sIf 

Discard,  send  NACK  referencing  SYN-Ver ify. 

NA4 

Data, 

ACK, 

NACK, 

Reject 

sif 

Discard  as  old. 

NAS 

OPEN. 

Retry 

SS 

Pick  ISN  and  send  SYN. 

NAE 

Quit. 

Retrans 

elf 

Does  not  occur  since  no  packets  pending. 

sif  - 

sel  f . 

SR  - 

SYN  Received.  SS  ■ SYN  Sent 

SS  - SYN  Sent 


Connection  Establishment  Proofs 


227 


Table  A-2 

3-Uay-Handshake  Protocol  State  Transitions 
from  SYN  Received  State 


P'l 


Name 

Event 

State 

Action  and  Context 

SRI 

SYN 

si  f 

Send  ACK  if  SYN  is  for  current  incarnation. 

SR2 

SYN 

si  f 

Send  NACK  if  SYN  is  not  for  current  incarnatio 

SR3 

SYN-Ver 

s 1 f 

Send  NACK  referencing  SYN-Ver  if  y. 

SR4 

Data 

elf 

Discard  as  out-of-order  (or  hold). 

SR5 

ACK 

ES 

If  ACK  refers  to  ponding  SYN- Verify. 
Third  part  of  3 uay  handshake. 

SR8 

NACK 

NA 

If  NACK  refers  to  ponding  SYN-Ver  if  y. 

SYN  previously  received  was  an  old  duplicate 

SR7 

ACK, 

NACK, 

Reject 

sif 

Ignore  if  does  not  refer  to  ponding  SYN-Vorlfy 

SR8 

OPEN 

slf 

Ignore  since  a i ready  in  progress. 

SR9 

Retrans 

sif 

Retransmit  pending  SYN-Vorlfy. 

SfllO 

Qui  t 

slf 

Notify  local  process. 

SRll 

Retry 

slf 

Ignore  since  other  side  has  already  started. 

ES  ■ Established,  NA  ■ Not  Active 
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Table  A-3 


3-Uay-Handshake  Protocol  State  Transitions 
from  SYN  Sent  State 


Next 


Name  Event  State  Action  and  Context 


Set  collision  retry  timer. 


SYN-Ver  ES 


Send  ACK  if  SYN-Verify  refers  to  pending  SYN. 
Second  part  of  3 uay  handshake. 


SYN-Ver  sif 


Data  sIf  Discard  as  out-of-order  (or  hold). 


Ignore  old  ACk. 


NACk  sif 


Reject  NA 


Reject  sif 


Ignore  if  Reject  does  not  refer  to 
pending  SYN. 


OPEN,  sif 
Retry 


Ignore  since  already  in  progress. 


SSIO  Retrans  sif 


SSll  Quit  sif 


Retransmit  pending  SYN  packet. 
Notify  local  process. 


self,  NA  - Not  Active,  ES  - EstabI 


i shed 


W 
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Table  A-4 


3-Uay-Handehake  Protocol  State  Transitions 
from  EstabI ished  State 


Name 

Event 

Next 

State 

Action  and  Context 

ESI 

SYN 

sif 

Send  NACK  referencing  SYN. 

ES2 

SYN-Ver 

sif 

Send  ACK  if  for  current  incarnation. 

ESS 

SYN-Ver 

elf 

Send  NACK  if  for  previous  Incarnation. 

ES4 

Data, 

ACK, 

Quit, 

Retrans 

sif 

Handle  data  communicatioe  as  described 
in  section  II-4  for  SPAR  protocol. 

ESS 

NACK, 

Reject 

sif 

Ignore  old  duplicates. 

ESS 

Open, 

Retry 

sif 

Ignore  since  accomplished. 

si  f > 

self 

i 


f 
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may  also  return  Reject  packet,  if  a destination  I.  unreachable 
for  various  reasons  (see  chapter  IV).  Both  of  these 

appi  ications  are  conveniences  that  provide  processes  uith  more 

information  than  simply  timing  out  on  their  quit  time. 

As  mentioned  at  the  beginning  of  this  appendix,  ue  are 
particuiarly  interested  in  analyzing  protocols  under  worst  case 
conditions.  Therefore  we  assume  that  dup  I i cate  packets  from 
previous  incarnations  of  a connection  may  be  heid  in  the 
transmission  medium  and  emerge  during  or  after  establishment  of 
the  current  incarnation.  This  means  that  all  nark.t  arrie;.i 
events  cpn  oppur  in  every  stptp.  even  though  some  packet  types 
would  not  appear  If  the  transmission  medium  delivered  packets  in 
order.  This  assumption  pr«:ludes  analysis  techniques  which 
depend  on  In-order  packet  transmission  (cf  section  II-1.2). 

Sequence  numbers  are  used  throughout  the  protocol  to 
uniquely  identify  packets.  Section  II-4  and  II-S  have  presented 
constraints  for  assigning  sequence  numbers  to  packets,  and 
demonstrated  the  serious  errors  that  can  occur  when  these 
constraints  are  violated.  As  a basis  for  the  correctness  proof 
in  this  appendix,  we  assume  that  window  size,  transmission  rate, 

and  Initial  sequence  ">*iber  selection  constraints  are  obeyed. 

This  guarantee,  that  aD.any  connection  there  ef  

.(different)  Packet  uith  a Particular  sequence  numb..- 
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A. 2 ‘Composite  State  flodel 

Construction  of  a compos  I le  state  diagram  from  the 
protocol  machine  in  section  A.l  is  primarily  a mechanical 
process.  Each  protocol  machine  transition  applicable  to  either 
of  the  processes  in  the  composite  state  becomes  a composite 
state  transition.  Any  packets  generated  as  part  of  the  action 
of  the  transition  are  added  to  the  current  packets  in  the 
composite  state.  Packets  are  removed  from  the  composite  state 
uhen  they  are  no  longer  "current."  A packet  Is  current  if 
ei ther: 

(1)  The  packet  is  pending  (uaiting  for  retransmi  aalnn)  a» 
the  sender.  Normally  this  condition  holds  until  the  sender 
receives  some  form  of  acknouledgement.  For  packets  uhich 
are  not  retransmitted  (ACK,  NACK,  Reject)  it  does  not  hold. 

(2)  The  packet  refers  to  a current  packet  traveling  In  the 
opposite  direction  (e.g.  ACK,  NACK,  or  Reject  of  another 
packet).  Uhen  the  opposite  packet  is  no  longer  current, 
both  packets  are  removed  from  the  composite  state. 

Limiting  packets  explicitly  considered  part  of  the 
composite  state  to  these  current  packets  is  possible  because  ue 
assume  that  aou  "old"  packet  (including  those  removed  from  the 
composite  state)  may  arrive  at  any  time.  If  the  protocol 
performs  correctly  under  this  uorst  case  assumption,  it  will 
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also  perform  correctly  under  conditions  Mhere  only  a subset  of 


all  possible  old  packets  may  arrive  late  and  out-of-order. 


Once  a packet  Is  generated  by  a protocol  transition,  It 
remains  in  the  composite  state  until  it  is  no  longer  current, 


despite  the  assumption  that  the  transmission  medium  can  lose  or 


damage  packets.  Since  every  current  packet  Is  either  being 
retransmitted,  or  is  a response  to  a packet  being  retransmitted. 


ue  can  assume  that  current  packets  are  aluays  available  to  cause 


transitions.  There  are  no  time  limits  on  transitions  occurring 


in  the  model.  In  reality,  packets  may  temporarily  "disappear" 
from  the  composite  state  if  lost  or  damaged,  but  ul  I I aluays 


reappear  due  to  retransmissions. 


Figure  A-2  shows  the  composite  state  diagram  for  the 


connection  establishment  protocol  defined  In  section  A.l. 


Symmetric  states  (Identical  except  for  switching,  process 


identities  and  packet  directions)  have  been  eliminated  to 


simplify  the  figure.  Transmissions  to  the  same  state  such  as 


retransmissions  are  not  shown.  Composite  transitions  resulting 
from  simultaneous  transitions  of  both  protocol  machines  are 


perfectly  legal,  but  are  shown  as  sequential  Individual 


transitions  to  reduce  the  number  of  arrows. 


Each  composite  state  Is  represented  by  a pair  of  process 


states  and  a list  of  current  packets.  Some  context  is 


represented  along  with  the  basic  state  of  each  process.  This 
consists  of  the  sequence  number  for  outgoing  packets  in  the  SYN 
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Sent,  SYN  Received,  and  Established  states,  and  also  the 
sequence  number  for  Incoming  packets  in  the  Es tab  1 1 shed  state. 
This  allous  us  to  determine  whether  the  protocol  has  correctly 
initialized  sequence  numbers  when  the  Established  state  is 
reached. 

Current  packets  are  represented  by  their  event  names, 
with  a subscript  giving  their  own  sequence  number  If  relevant, 
followed  by  the  sequence  number  of  another  packet  they  may  refer 
to  (in  parentheses).  An  arrow  above  the  packet  shows  Its 
direction  of  travel.  Thus 

SYnIv^  (x) 

represents  a SYN-Verify  packet  with  sequence  number  y,  referring 
to  another  packet  with  sequence  number  x,  and  traveling  from 
left  to  right. 


A. 3 Correctness  Under  Normal  Operation 

Figure  A-2  presents  al  I composite  states  reachable  if 
both  protocol  machines  start  in  the  Not  Active  state  and 
function  according  to  their  definition  (no  failures).  Several 
important  results  emerge. 

There  are  no  terminal  states  with  one  process 
EstabI  shed  and  the  other  not  established.  The  only  terminal 
states  have  both  processes  Not  Active  (If  a connection  was 
rejected)  or  both  processes  Established.  Furthermore,  when  both 
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processes  are  Established,  sequence  numbers  for  both  directions 
are  properly  initialized  as  described  in  section  1 1 -5. 2.  There 
is  no  deadlock  in  either  connection  esatbi  ishment  or  subsequent 
exchange  of  data. 

All  paths  leading  back  to  the  Not  Active  state  (except 
the  reject  path)  for  either  process  Involve  collisions  which 
will  cause  a later  retry  to  establish  the  connection.  Assuming 
that  perpetual  colllslohs  can  be  avoided,  and  that  the 
transmission  medium  provides  a nonzero  probability  of  delivering 
any  packet,  the  protocol  will  eventually  succeed  In  establishing 
a connection  (unless  the  attempt  is  rejected). 

These  results  show  the  suf f Iclencu  of  the  connection 
establishment  mechanisms  embodied  in  the  protocol  machine  of 
section  A.l.  Their  necessity  is  demonstrated  by  theorems  4-8  In 
section  1 1 -5.  Ue  reprove  theorem  6 in  section  A.  5 be  I ow:  using 
the  composite  state  formalism  to  show  that  simpler  connection 
establishment  mechanisms  fall  under  our  assumptions  of  worst 
case  transmission  medium  behavior. 


)nseauenc 


-ai lures 


Section  A. 3 has  analyzed  connection  establishment  under 
normal  operating  conditions  where  both  protocol  machines  start 
in  the  Not  Active  state.  In  this  section  we  discuss  the 
consequences  of  protocol  failures  and  the  problem  of 
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reestablishing  a connection  after  a failure.  Ue  describe  one 
mechanism  to  facilitate  recovery  from  a common  type  of  protocol 
failure.  The  general  question  of  failure  recovery  and 
self-stabilizing  systems  [Dijkstra74]  requires  further  research. 

A common  protocol  failure  occurs  uhen  the  machine  at  one 
side  of  a connection  "crashes."  losing  state  information  about 
the  connection.  As  discussed  in  section  1 1 -3.  it  is  impossible 
to  guarantee  the  recovery  of  data  in  transit  at  the  time  of  the 
failure,  but  it  is  desirable  that  the  protocol  should  quickly 
detect  the  failure  and  reinitialize  the  connection  for  reliable 
communication  after  the  failure. 

After  a protocol  failure,  ue  assume  that  all  connections 
are  placed  In  the  Not  Active  state.  The  other  side  of  a 
previously  active  connection  may  still  be  in  the  Established 
state.  The  composite  state  of  the  system  Mill  then  be  (NA)(E5) 
uith  some  current  packets  possibly  still  being  transmitted  from 
right  to  left.  There  Is  no  such  composite  state  In  figure  A-2, 
indicating  that  this  "half  open"  connection  state  can  not  occur 
in  the  normal  protocol  operation.  Ulth  the  current  protocol 
specification  in  tables  A-1  to  A-4,  this  state  also  has  no  exit: 
once  reached  due  to  a failurs,  it  Is  permanent.  If  the 
Established  side  sends  data  to  the  Not  Active  side.  It  Is 
discarded  as  out-of-order  (transition  NA4).  If  the  Not  Active 
side  attempts  to  reestablish  the  connection  by  sending  a SYN 
packet,  the  Established  side  returns  a negative  acknowledgement, 
thinking  the  SYN  is  an  old  duplicate  (transition  ESI). 


f 
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To  break  this  stalemate,  ue  Introduce  two  additional 
protocol  machine  transitions: 

FFl:  In  the  Not  Active  state,  if  a data  packet  arrives,  discard 
it,  but  return  a Reject  packet  referencing  the  data  packet’s 
sequence  number. 

FF2;  In  the  Established  (or  SYN  Received)  state,  if  a Reject 
packet  arrives  and  it  references  a pending  packet,  go  to  the 
Not  Active  state  and  notify  the  local  process  that  the 
connection  is  being  restarted  (after  possible  loss  of 
pending  data).  Then  reestablish  the  connection. 


FIGURE  A-3  ADDITIONAL  COMPOSITE  STATE  TRANSITIONS 
FOR  FAILURE  RECOVERY 

(NAMES  MDsS^) ►(NAMES  )(Dstt^,  Rsisct  (x))  — —>►  (NA)(NA) 

— * -i-  X 


Figure  A-3  shows  the  additions  to  the  composite  state 
diagram  resulting  from  these  two  transitions.  Uithout  these 
added  transitions,  the  Established  side  would  eventually  exceed 
its  Quit  time  without  knowing  why.  With  these  transitions,  th« 
protocol  will  restart  itself  (with  possible  loss  of  pending 
data)  as  long  as  the  Established  side  trios  to  send  some  data 
after  the  failure.  However,  if  the  EstabI  i shed  side  is  passive 


Connection  Establishment  Proofs 


238 


(waiting  to  receive  data),  the  stalemate  persists.  In  (Cerf74bl 
(TCP),  a Reset  control  packet  is  suggested  to  force  return  to 
the  Not  Active  state  in  half  open  connections,  but  the  later 
duplicate  arrival  of  a Reset  may  prove  dangerous. 

A. 5 Inadeouacu  of  Simple  Protocol 

Section  II-5  informally  demonstrated  the  inadequacy  of 
simple  connection  establishment  mechanisms  in  a hostile 
transmission  environment.  Here  wg  use  composite  state  analysis 
to  reprove  theorem  B. 

The  state  diagram  for  a simple  connection  establishment 
protocol  is  shown  in  figure  A-4.  Arriving  SYN  packets  are 
simply  accepted  and  acknowledged  if  the  protocol  is  ready  to 
establirh  a connection,  (i.e.  in  the  Not  Active  or"  SYN  Sent 
states),  or  discarded  as  duplicates  if  a SYN  has  already  been 
accepted  (i.e.  SYN  Received  or  Established  states).  The 
protocol  does  not  check  to  be  sure  that  an  arriving  SYN  packet 
is  current.  Collisions  no  longer  occur  since  simultaneously 
transmitted  SYN  packets  serve  as  responses  to  each  other. 

Events  are  a subset  of  those  in  the  3 way  handshake 
protocol  since  no  SYN-Verify  or  NACK  packets  exist.  The  Reject 
facility  has  also  been  removed  to  simplify  the  analysis.  Table 
A-5  defines  the  complete  set  of  transitions  and  actions  for  this 
simple  protocol  machine. 
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FIGURE  A - 4 SIMPLE  CONNECTION  ESTABLISHMENT  PROTOCOL  MACHINE 
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1 

I 

I Table  A-5 

I 

1 State  Tranaltlona  for  Slaple  Protocol  Machine 


Current 
State  Event 

Next 

State  Action  and  Context 

NA 

SYN 

SR 

Accept  SYN.  Pick  ISN  and  aend  S/N 
packet  uith  ACK  of  received  SYN. 

OPEN 

SS 

Pick  ISN  and  aend  SYN  packet. 

ACK, 

Quit, 

Data, 

Retrane 

elf 

Ignore  el  nee  no  pacKete  pending. 

SS 

SYN 

SR 

Accept  SYN  packet. 

Serd  ACK  referencing  received  SYN. 

OPEN 

elf 

Ignore  el  nee  aireadg  In  progreee. 

Retrane 

elf 

Retranralt  pending  SYN  packet. 

Quit 

elf 

Notify  local  proceee. 

ACK, 

Data 

elf 

Diecard  c packete. 

SR 

SYN 

elf 

Ignore  SYN  packet. 

ACK 

ES 

If  ACK  refare  to  pending  SYN. 

ACK 

elf 

If  ACK  doea  not  refer  to  pending  SYN. 

Data 

elf 

Diecard  aa  out-of-order  (or  hold). 

DPEN 

elf 

Ignore  a I nee  in  progreee. 

Retrane 

elf 

Retranaait  «>anding  SYN  packet. 

Quit 

elf 

Notify  local  proceaa. 

ES 

S‘*N 

elf 

Ignore  SYN  packet. 

ACK, 

Data, 

Quit, 

Retrane 

elf 

Handle  data  coaaunication  aa  deacribed 
. In  aectlon  1 1 -4  (SPAR  protocol). 

DPEN 

elf 

Ignore  a I nee  already  in  progreee. 

NA-Not  Active,  SS-SYN  Sent,  SR-SYN  Received,  ES-£etabl iehed 


1 


if 

I 3 


FIGURE  A - 5 COMPOSITE  STATE  DIAGRAM  FOR  SIMPLE  PROTOCOL 


(ES  ) ESTABLISHED 
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Figure  A-5  shous  the  resulting  composite  state  diagram 
using  the  same  notation  as  section  A. 3.  Assuming  the  protocol 
starts  uith  both  processes  Not  Active,  three  terminal  states  are 
possible.  Correct  connection  establishment  (state  labeled  "a") 
occurs  if  no  oid  SYN  packets  emerge  from  the  transmission  medium 
uhile  initialization  is  in  progress.  Houever,  if  one  process 
receives  an  old  SYN  packet  at  an  inopportune  moment  during 
connection  establishment  (see  figure  9 in  chapter  II),  that  side 
of  the  connection  ui I I be  established  uith  incorrectly 
initialized  sequence  numbers  (state  labeled  "b").  Old  duplicate 
data  follouing  the  old  SYN  may  be  accepted  in  this  state.  The 
other  side  of  the  connection  uill  remain  in  the  SYN  Sent  state 
because  the  Established  side  discards  arriving  SYN  packets.  If 
both  processes  receive  old  SYN  packets  during  connection 
establishment,  both  processes  may  be  trapped  in  the  SYN  Sent 
state  and  communication  uill  be  blocked  in  both  direct! one 
(state  labeled  "c"). 

This  duplicates  the  results  obtained  informally  in 
theorem  6 that  such  a "credulous"  connection  establishment 
protocol  Is  inadequate  for  use  uith  hostile  transmission  media. 
Our  analysis  has  found  both  an  illegal  state  (state  b allous  old 
data  to  be  delivered  again)  and  a deadlock  (state  c is  a 
terminal  state  uith  each  process  requiring  a response  from  the 
other  that  is  not  forthcoming). 
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ARPANET;  Heart70,  ttcQui  I Ian72,  Frank70,  0rnsteln72,  Rober*t72, 
Carr70,  Crocker72,  Crowther75,  Kleinrock74a 

CYCLADES:  Pouzin73b,  Z i ir.meriiian75 

EPSS:  Beeforth72,  Bright74,  Bright75 

TYMNET:  Beere71,  Coffibs73,  Tymes71 

ALOHANET:  Abram8on70,  Abram8on73,  Kuo73 

PRNET;  Burchfiel75,  Kahn75,  and  papera  from  PRNET  8888 ion  In 
Proc.  National  Computer  Conf,.  1975,  AFIPS  Preaa. 

UCL:  Higgin8on75,  Lloyd75b,  Stoke875 

XS;  Farber72,  Farbor72a,  Farber73,  Farber73a,  Ro»je73, 
Roue75 
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References  are  cited  by  the  principal  author's  last  name 
folloued  by  the  year  of  publication  (and  a letter  to  distinguish 
multiple  publications  from  the  same  year). 

INUG  Notes  are  available  from  Technical  Committee  6.1  of 
IFIP  for  the  cost  of  reproduction:  IFIP  UG6.1,  Vinton  Cerf, 
Chairman,  Digital  Systems  Lab,  Stanford  University,  Stanford, 
Ca.  94305. 

RFC's  (Requests  for  Comments)  and  NIC  documents  are 
available  from  the  Netuork  Information  Center  in  limited 
numbers:  Elizabeth  (Jake)  Feinler,  Stanford  Research  Institute, 
Network  Information  Center,  333  Ravenswood  Av. , Menlo  Park,  Ca. 
94025. 
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